SlideShare a Scribd company logo
1 of 33
Download to read offline
2018. 12. 11
DSP & AI Lab
Min-Jae Hwang
TOWARD WAVENET SPEECH SYNTHESIS
PROFILE
2
Min-Jae Hwang
• Affiliation : 7th semester of MS/Ph.D. student
DSP & AI Lab., Yonsei Univ., Seoul, Korea
• Adviser : Prof. Hong-Goo Kang
• E-mail : hmj234@dsp.yonsei.ac.kr
Work experience
• 2017.12 ~ 2017.12 – Intern at NAVER corporation
• 2018.01 ~ 2018.11 – Intern at Microsoft Research Asia
Research area
• 2015.8 ~ 2016.12 - Audio watermarking
• M. Hwang, J. Lee, M. Lee, and H. Kang,“사전 분석법을 통한 스프레드 스펙트럼 기반 오디오 워터마킹 알고리즘의 성능 향상,” 한국음향학회 제 33회 음성통신 및
신호처리 학술대회, 2016
• M. Hwang, J. Lee, M. Lee, and H. Kang, “SVD-based adaptive QIM watermarking on stereo audio signals,” in IEEE Transactions on Multimedia, 2017
• 2017.1 ~ Present - Deep learning-based statistical parametric speech synthesis
• M. Hwang, E. Song, K. Byun, and H. Kang, “Modeling-by-generation structure-based noise compensation algorithm for glottal vocoding speech synthesis syste
m,” in ICASSP, 2018
• M. Hwang, E. Song, J. Kim, and H. Kang, “A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis
systems,” in Interspeech, 2018
• M. Hwang, F. Soong, F. Xie, X. Wang, and H. Kang, “LP-WaveNet: linear prediction-based WaveNet speech synthesis,” in ICASSP, 2019 [Submitted]
CONTENTS
Introduction
• Story of WaveNet vocoder
Various types of WaveNet vocoders
• SoftMax-WaveNet
• MDN-WaveNet
• ExcitNet
• Proposed LP-WaveNet
Tuning of WaveNet
• Noise injection
• Waveform generation methods
Experiments
Conclusion & Summary
3
INTRODUCTION
Story of WaveNet vocoder
Traditional
Text-to-speech
WaveNet MDN-WaveNet ExcitNet LP-WaveNet
~2015 2016 2017 2018 2019
WaveNet
Vocoder
2017
CLASSICAL TTS SYSTEMS
• Parameterize speech signal using vocoder
technique
• Pros) High controllability, small database
• Cons) Low quality
5
• Concatenate tiny segment of real speech
signal
• Pros) High quality
• Cons) Low controllability, large database
Unit-selection speech synthesis [1] Statistical parametric speech synthesis (SPSS) [2]
Text-to-speech
GENERATIVE MODEL-BASED SPEECH SYNTHESIS
WaveNet [3]
• First generative model for raw audio waveform
• Predict the probability distribution of waveform sample auto-regressively
• Generate high quality of audio / speech signal
• Impact on the task of TTS, voice conversion, music synthesis, etc.
WaveNet in TTS task
6
1
( ) ( | )
N
n n
n
p p x <
=
= Õx x
[Mean opinion scores]
• Utilize linguistic features and F0
contour as a conditional information
[WaveNet speech synthesis]
• Present higher quality than the
conventional TTS systems
WAVENET VOCODER-BASED SPEECH SYNTHESIS
Utilize WaveNet as parametric vocoder [4]
• Use acoustic features as conditional information
Advantages
• Higher quality synthesized speech than the conventional vocoders
• Don’t require hand-engineered processing pipeline
• Higher controllability than the case of linguistic features
• Controlling acoustic features
• Higher training efficiency than the case of linguistic features
• Linguistic feature: 25~35 hour database
• Acoustic feature: 1 hour database
7
[WaveNet vocoder based parametric speech synthesis]
Conventional
vocoder
WaveNet
vocoder
Recorded
1. SoftMax-WaveNet
2. MDN-WaveNet
3. ExcitNet
4. Proposed LP-WaveNet
Traditional
Text-to-Speech
WaveNet MDN-WaveNet ExcitNet LP-WaveNet
~2015 2016 2017 2018 2019
WaveNet
Vocoder
2017
VARIOUS TYPES OF WAVENET VOCODERS
BASIC OF WAVENET
Auto-regressive generative model
• Predict probability distribution of speech samples
• Use past waveform samples as a condition information of WaveNet
Problem: Long-term dependency nature of speech signal
• Highly correlated speech signal in high sampling rate, e.g. 16000 Hz
• E.g. 1) At least 160 (=16000/100) speech samples to represent 100Hz of voice correctly
• E.g. 2) Averagely 6,000 speech samples to represent single word1
Solution
• Put the speech samples as an input of dilated causal convolution layer
• Stack the dilated causal convolution layer
• Result in exponentially growing receptive field
9
: 1
1
( ) ( | )
N
n n R n
n
p p x - -
=
= Õx x
[Dilated causal convolution]
1 Statistics based on the average speaking rate of a set of TED talk speakers
http://sixminutes.dlugan.com/speaking-rate/
Effective embedding method of long receptive field is required
BASIC OF WAVENET
WaveNet without dilation
10
[WaveNet without dilated convolution]
Receptive field: # layer - 1
BASIC OF WAVENET
WaveNet with dilation
11
[WaveNet with dilated convolution]
Receptive field: 𝟐# layer − 𝟏
BASIC OF WAVENET
12
Multiple stack of residual blocks
1. Dilated causal convolution
• Exponentially increase the receptive field
2. Gated activation
• Impose non-linearity on the model
• Enable conditional WaveNet
3. Residual / skip connections
• Speed up the convergence
• Enable deep layered model training
( ) ( )tanh * * sigmoid * *f f g gW V W V= + +z x h x h!
1
0
[ , ] [ ] [ ]
M
k
y d n h k x n d k
-
=
= - ×å
[Basic WaveNet architecture]
: input of activation
: conditional vector
: output of activation
x
h
z
SOFTMAX-WAVENET
Use WaveNet as multinomial logistic regression model [4]
• 𝜇-law companding for evenly distributed speech sample
• 8-bit one-hot encoding
• 256 symbols
Estimate each symbol using WaveNet
• Predict the sample by SoftMax distribution
Optimize network by cross-entropy (CE) loss
• Minimize the probabilistic distance between 𝑝& and 𝑞&
13
( )ln 1
( ) , =255
ln(1 )
x
y sign x
µ
µ
µ
+
= ×
+
8 ( )bitp OneHot y=
exp( )
exp( )
q
n
n q
i
i
z
q
z
=
å
[ ]logn n
n
L p q= -å
( | )q
nWaveNet <=z q h
[Distribution of speech samples]
[SoftMax-WaveNet]
SOFTMAX-WAVENET
Limitation by the usage of 8-bit quantization
• Noisy synthetic speech due to insufficient number of quantization bits
Intuitive solution
• Expand the SoftMax dimension to 65,536 corresponds to 16-bit quantization
à High computational cost & difficult to train
Mixture density network (MDN)-based solution [5]
• Train the WaveNet to predict the parameter of pre-defined speech distribution
14
MDN-WAVENET
Define the distribution of waveform sample as parameterized form [5]
• Discretized mixture of logistic (MoL) distribution
• Discretized logistic mixture with 16bit quantization (Δ = 1/2-.
)
Estimate mixture parameters by WaveNet
Optimize network by negative log-likelihood (NLL) loss
15
: logistic functions
[ ]log ( | , )n n
n
L p x x<= -å h
1
/ 2 / 2
( )
N
n n
n
n n n
x x
p x
s s
µ µ
p s s
=
é ùæ ö æ ö+ D - - D -
= × -ê úç ÷ ç ÷
è ø è øë û
å
[ , , ] ( , )s
nWaveNetp µ
<=z z z x h
softmax( )
exp( )s
p
µ
=
=
=
π z
µ z
s z
, for unity-summed mixture gain
, for positive value of mixture scale
Mixture density network
[MDN-WaveNet]
MDN-WAVENET
Higher quality than SoftMax-WaveNet
• Enable to model the speech signal by 16-bit
• Overcome difficulty of spectrum modeling
Difficulty of WaveNet training
• Due to increased quantization bit from 8-bit to 16-bit
Solution based on the human speech production model [6]
• Model the vocal source signal, whose physical behavior is much simpler than the speech signal
16
Mixture density network
[Spectrogram comparison] [Loss comparison]
SoftMax
WaveNet
MDN
WaveNet
Recorded
SPEECH PRODUCTION MODEL
Concept
• Modeling the speech as the filtered output of vocal source signal to vocal tract filter
Methodology: Linear prediction (LP) approach
• Define speech signal as linear combination of past speech samples
17
[ ]( ) ( ) ( ) ( ) ( )S z G z V z R z E z= × × ×
[Speech production model] [Spectral deconvolution by LP analysis]
1
P
n i n i n
i
s s ea -
=
= +å
1
1
( ) ( ) ( ), where ( )
1
p i
ii
S z H z E z H z
za -
=
= × =
- å
Spectrum part = LP coefficients
Excitation part = Residual signal of LP analysis
Speech = [vocal fold × vocal tract ×	lip radiation] × excitation
EXCITNET
Model the excitation signal by WaveNet, instead of speech signal [6]
Training stage
• Extract excitation signal by linear prediction (LP) analysis
• Periodically updated filter coefficients matched to frame rate of acoustic feature
• Train WaveNet to model excitation signal
Synthesis stage
• Generated excitation signal by WaveNet
• Synthesize speech signal by LP synthesis filtering
18
1
P
n n i n i
i
e s sa -
=
= - å
1
p
n i n i n
i
s s ea -
=
= +å
[ExcitNet]
Recorded Excitation LP synthesis
EXCITNET
High quality of synthesized speech even though 8-bit quantization is used
• Overcome the difficulty of spectrum modeling by using external spectral shaping filter
Limitation
• Independent modeling of excitation and spectrum parts results in unpredictable
prediction error
19
1
1
( ) ( )
1
p
i
i
i
S z E z
za -
=
= ×
- å
Contains error
Contains error
Objective
Represent LP synthesis process during WaveNet training / generation
Recorded
SoftMax
WaveNet ExcitNet
LP-WAVENET
Motivation from the assumption of WaveNet vocoder
1. Previous speech samples, 𝐱3&, are given
2. LP coefficients, {𝛼6}, are given
Probabilistic analysis
• Difference between the random variables 𝑋& and 𝐸& is only a constant value of 𝑥;&
Assume the discretized MoL distributed speech
• Shifting property of 2nd order random variable
20
1
ˆ
p
n i n i
i
x xa -
=
= åTheir linear combination, , are also given
( | , )n np e <x h
ˆnx
( | , )n np x <x h
[Difference of distributions between speech and excitation]
ˆ
x e
i i
x e
i i n
x e
i i
x
s s
p p
µ µ
=
= +
=
Only mean parameters
are different
Linear prediction
ˆ| ( , ) ( ) | ( , )
ˆ| ( , )
n n n n n
n n n
X E x
E x
< <
<
= +
= +
x h x h
x h
ˆn n nx e x= +
LP-WAVENET
Utilize the causality of WaveNet and the linearity of LP synthesis processes [7]
21
( ), , ,s
nWaveNetp µ
<
é ù =ë ûz z z x h
1
( | , )
/ 2 / 2
n n
N
i i
i
i i i
p x
x x
s s
µ µ
p s s
<
=
=
é ùæ ö æ ö+ D - - D -
× -ê úç ÷ ç ÷
è ø è øë û
å
x h
1. Mixture parameter prediction
2. Mixture parameter refinement
3. Discretized MoL loss calculation
[ ]log ( | , )n n
n
E p x <= -å x h
softmax( )
ˆ
exp( )
n
s
x
p
µ
=
= +
=
π z
µ z
s z
[LP-WaveNet]
Linear prediction Recorded ExcitNet LP-WaveNet
TUNING OF WAVENET
1. Solution to waveform divergence problem
2. Waveform generation methods
WAVEFORM DIVERGENCE PROBLEM
Waveform divergence problem
• Waveform divergence by a stacked error during WaveNet’s autoregressive generation
Cause – Overfitting on the silence region
• Unique solution in silence region
• Easier to be happen when the portion of silence region in training set is larger
• Be sensitive to tiny error in silence region during waveform generation
Solution – Noise injection
• Inject negligible amount of noise
• Allowing only 1 bit error
• Increase robustness to the prediction error
in silence region
23
( ),curr prevx WaveNet= x h ( )0 , silWaveNet= 0 h
16
ˆ , where = 2 2e e= + ×x x n 0 0 0
1 1 1
2 2 2
~ ( | )
~ ( | )
~ ( | )
~ ( | )n n n
x p x
x p x
x p x
x p x
<
<
<
<
x
x
x
x
!
0err
0 1err err+
i
i
errå
!
[Error stacking during waveform generation]
WAVEFORM GENERATION METHODS
Random sampling
Argmax sampling
Greedy sampling [8]
Mode sampling [7]
• Sharper distribution on voiced region
24
ˆ ( ) ~ ( ( ) | ( ), )randx n p x n x k n< h
max
ˆ ˆ ˆ( ) ( ) (1 ) ( )greedy randx n vuv x n vuv x n= × + - ×
max
( )
ˆ ( ) arg max ( ( ) | ( ), )
x n
x n p x n x k n= < h
,
ˆ ˆ ˆ( ) ( ) (1 ) ( )mode rand nar randx n vuv x n vuv x n= × + - ×
1
ˆ ( ) ~ ( , )
I
rand i i i
i
x n DistLogic sp µ
=
å
,
1
ˆ ( ) ~ ,
I
i
rand narr i i
i
s
x n DistLogic
c
p µ
=
æ ö
ç ÷
è ø
å
[Random & argmax sampling]
[Scale parameter control in mode sampling]
randx
maxx
Random
sampling
Recorded
Argmax
sampling
Greedy
sampling
Mode
sampling
EXPERIMENTS
EXPERIMENTS
Network architecture
Systems
• WN<: MDN-WaveNet that models the speech signal
• WN=: MDN-WaveNet that models the excitation signal
• WN>?: Proposed LP-WaveNet
26
Database Korean male: MBC YBS database
Training / validation / Test 2,500 (~3.2h) / 200 / 200
Minibatch size 4 GPU x 20000 samples
Dilation 3 * [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
Layer 30
Receptive field 3070 samples
Residual chn. 128
Skip chn. 128
Quantization 65536 uniform
Number of mixture 10 mixture components --> 30 output channels
Conditioning features Voicing flag, LSF, F0 (log), Eng (log), BAP
Normalization Gaussian normalization
Learning rate 1.00E-04
Initialization / optimizer Xavier / Adam
Sample generation method Mode sampling
LEARNING CURVE
27
Training speed
𝐖𝐍 𝑳𝑷 = WN= > WN<
OBJECTIVE EVALUATION
Performance measurements
• VUV: Voicing error rate (%)
• F0 RMSE: F0 root mean square error (Hz) in voiced region
• LSD: Log-spectral distance of AR spectrum (dB)
• F-LSD: LSD of synthesized speech in frequency domain (dB)
Results
28
A/S: analysis / synthesis
SPSS: synthesis by predicted acoustic features
SUBJECTIVE EVALUATION
Mean opinion score (MOS) test
• 20 random synthesized utterances from test set
• 12 native Korean speakers
• Include STRAIGHT (STR)-based speech synthesis system as baseline [9]
Results
29
A/S: analysis / synthesis
SPSS: synthesis by predicted acoustic features
SAMPLES
Recorded
STRAIGHT – A/S
WN< – A/S
WN= – A/S
WN>? – A/S
30
STRAIGHT – SPSS
WN< – SPSS
WN= – SPSS
WN>? – SPSS
A/S: analysis / synthesis
SPSS: synthesis by predicted acoustic features
SUMMARY & CONCLUSION
Investigated various types of WaveNet vocoder
• SoftMax-WaveNet
• MDN-WaveNet
• ExcitNet
Proposed an LP-WaveNet
• Explicitly represent linear prediction structure of speech waveform into WaveNet framework
Introduced tuning methods of WaveNet training / generation
• Noise injection
• Sample generation methods
Performed experiment in objective and subjective manner
• Achieve faster convergence speed than conventional WaveNet vocoders, whereas keep
similar speech quality
31
REFERENCES
[1] A. J. Hunt and A. W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in
Proc. ICASSP, 1996
[2] H. Zen, et. al. “Statistical parametric speech synthesis using deep neural networks,” in Proc. ICASSP, 2013.
[3] A. van den Oord, et. al.,“WaveNet: A generative model for raw audio generation,” in CoRR, 2016.
[4] A.Tamamori, et. al., “Speaker-dependent wavenet vocoder,” in INTERSPEECH, 2017.
[5] A. van den Oord, et. al., "Parallel WaveNet: Fast High-fidelity speech synthesis," in CoRR, 2017
[6] E. Song, K. Byun, and H.-G. Kang, “A neural excitation vocoder for statistical parametric speech synthesis systems,” in
Arxiv, 2018.
[7] M. Hwang, F. Soong, F. Xie, X. Wang, and H. Kang, “LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis,”
Submitted to Proc. ICASSP, 2019.
[8] W. Xin, et. al., “A comparison of recent waveform generation and acoustic modeling methods for neural-network-based
speech synthesis,” in Proc. ICASSP, 2018.
[9] H. Kawahara, “STRAIGHT, exploration of the other aspect of vocoder: Perceptually isomorphic decomposition of speech
sounds,” Acoustical Science and Technology, 2006.
32
Thank you!
33

More Related Content

What's hot

Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative ModelsMijung Kim
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief netsbutest
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Simplilearn
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016Taehoon Kim
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...hyunyoung Lee
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Simplilearn
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 

What's hot (20)

Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Domain adaptation
Domain adaptationDomain adaptation
Domain adaptation
 
Deep Generative Models
Deep Generative ModelsDeep Generative Models
Deep Generative Models
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Bleu vs rouge
Bleu vs rougeBleu vs rouge
Bleu vs rouge
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief nets
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
텐서플로우 설치도 했고 튜토리얼도 봤고 기초 예제도 짜봤다면 TensorFlow KR Meetup 2016
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
(Paper Seminar detailed version) BART: Denoising Sequence-to-Sequence Pre-tra...
 
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
Deep Learning Tutorial | Deep Learning TensorFlow | Deep Learning With Neural...
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 

Similar to Toward wave net speech synthesis

Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition finalArchit Vora
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfBala Murugan
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...IJERA Editor
 
Translatotron review
Translatotron reviewTranslatotron review
Translatotron reviewJune-Woo Kim
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisRushin Shah
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsTakuya Yoshioka
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...IJERA Editor
 

Similar to Toward wave net speech synthesis (20)

Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech recognition final
Speech recognition finalSpeech recognition final
Speech recognition final
 
H0814247
H0814247H0814247
H0814247
 
Real Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform DomainReal Time Speech Enhancement in the Waveform Domain
Real Time Speech Enhancement in the Waveform Domain
 
Research_Wu.pptx
Research_Wu.pptxResearch_Wu.pptx
Research_Wu.pptx
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdfA_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
A_Noise_Reduction_Method_Based_on_LMS_Adaptive_Fil.pdf
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
Performance Evaluation of Conventional and Hybrid Feature Extractions Using M...
 
Translatotron review
Translatotron reviewTranslatotron review
Translatotron review
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
COLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech AnalysisCOLEA : A MATLAB Tool for Speech Analysis
COLEA : A MATLAB Tool for Speech Analysis
 
F010334548
F010334548F010334548
F010334548
 
WaveNet.pdf
WaveNet.pdfWaveNet.pdf
WaveNet.pdf
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domains
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
Compressive Sensing in Speech from LPC using Gradient Projection for Sparse R...
 

More from NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

Toward wave net speech synthesis

  • 1. 2018. 12. 11 DSP & AI Lab Min-Jae Hwang TOWARD WAVENET SPEECH SYNTHESIS
  • 2. PROFILE 2 Min-Jae Hwang • Affiliation : 7th semester of MS/Ph.D. student DSP & AI Lab., Yonsei Univ., Seoul, Korea • Adviser : Prof. Hong-Goo Kang • E-mail : hmj234@dsp.yonsei.ac.kr Work experience • 2017.12 ~ 2017.12 – Intern at NAVER corporation • 2018.01 ~ 2018.11 – Intern at Microsoft Research Asia Research area • 2015.8 ~ 2016.12 - Audio watermarking • M. Hwang, J. Lee, M. Lee, and H. Kang,“사전 분석법을 통한 스프레드 스펙트럼 기반 오디오 워터마킹 알고리즘의 성능 향상,” 한국음향학회 제 33회 음성통신 및 신호처리 학술대회, 2016 • M. Hwang, J. Lee, M. Lee, and H. Kang, “SVD-based adaptive QIM watermarking on stereo audio signals,” in IEEE Transactions on Multimedia, 2017 • 2017.1 ~ Present - Deep learning-based statistical parametric speech synthesis • M. Hwang, E. Song, K. Byun, and H. Kang, “Modeling-by-generation structure-based noise compensation algorithm for glottal vocoding speech synthesis syste m,” in ICASSP, 2018 • M. Hwang, E. Song, J. Kim, and H. Kang, “A unified framework for the generation of glottal signals in deep learning-based parametric speech synthesis systems,” in Interspeech, 2018 • M. Hwang, F. Soong, F. Xie, X. Wang, and H. Kang, “LP-WaveNet: linear prediction-based WaveNet speech synthesis,” in ICASSP, 2019 [Submitted]
  • 3. CONTENTS Introduction • Story of WaveNet vocoder Various types of WaveNet vocoders • SoftMax-WaveNet • MDN-WaveNet • ExcitNet • Proposed LP-WaveNet Tuning of WaveNet • Noise injection • Waveform generation methods Experiments Conclusion & Summary 3
  • 4. INTRODUCTION Story of WaveNet vocoder Traditional Text-to-speech WaveNet MDN-WaveNet ExcitNet LP-WaveNet ~2015 2016 2017 2018 2019 WaveNet Vocoder 2017
  • 5. CLASSICAL TTS SYSTEMS • Parameterize speech signal using vocoder technique • Pros) High controllability, small database • Cons) Low quality 5 • Concatenate tiny segment of real speech signal • Pros) High quality • Cons) Low controllability, large database Unit-selection speech synthesis [1] Statistical parametric speech synthesis (SPSS) [2] Text-to-speech
  • 6. GENERATIVE MODEL-BASED SPEECH SYNTHESIS WaveNet [3] • First generative model for raw audio waveform • Predict the probability distribution of waveform sample auto-regressively • Generate high quality of audio / speech signal • Impact on the task of TTS, voice conversion, music synthesis, etc. WaveNet in TTS task 6 1 ( ) ( | ) N n n n p p x < = = Õx x [Mean opinion scores] • Utilize linguistic features and F0 contour as a conditional information [WaveNet speech synthesis] • Present higher quality than the conventional TTS systems
  • 7. WAVENET VOCODER-BASED SPEECH SYNTHESIS Utilize WaveNet as parametric vocoder [4] • Use acoustic features as conditional information Advantages • Higher quality synthesized speech than the conventional vocoders • Don’t require hand-engineered processing pipeline • Higher controllability than the case of linguistic features • Controlling acoustic features • Higher training efficiency than the case of linguistic features • Linguistic feature: 25~35 hour database • Acoustic feature: 1 hour database 7 [WaveNet vocoder based parametric speech synthesis] Conventional vocoder WaveNet vocoder Recorded
  • 8. 1. SoftMax-WaveNet 2. MDN-WaveNet 3. ExcitNet 4. Proposed LP-WaveNet Traditional Text-to-Speech WaveNet MDN-WaveNet ExcitNet LP-WaveNet ~2015 2016 2017 2018 2019 WaveNet Vocoder 2017 VARIOUS TYPES OF WAVENET VOCODERS
  • 9. BASIC OF WAVENET Auto-regressive generative model • Predict probability distribution of speech samples • Use past waveform samples as a condition information of WaveNet Problem: Long-term dependency nature of speech signal • Highly correlated speech signal in high sampling rate, e.g. 16000 Hz • E.g. 1) At least 160 (=16000/100) speech samples to represent 100Hz of voice correctly • E.g. 2) Averagely 6,000 speech samples to represent single word1 Solution • Put the speech samples as an input of dilated causal convolution layer • Stack the dilated causal convolution layer • Result in exponentially growing receptive field 9 : 1 1 ( ) ( | ) N n n R n n p p x - - = = Õx x [Dilated causal convolution] 1 Statistics based on the average speaking rate of a set of TED talk speakers http://sixminutes.dlugan.com/speaking-rate/ Effective embedding method of long receptive field is required
  • 10. BASIC OF WAVENET WaveNet without dilation 10 [WaveNet without dilated convolution] Receptive field: # layer - 1
  • 11. BASIC OF WAVENET WaveNet with dilation 11 [WaveNet with dilated convolution] Receptive field: 𝟐# layer − 𝟏
  • 12. BASIC OF WAVENET 12 Multiple stack of residual blocks 1. Dilated causal convolution • Exponentially increase the receptive field 2. Gated activation • Impose non-linearity on the model • Enable conditional WaveNet 3. Residual / skip connections • Speed up the convergence • Enable deep layered model training ( ) ( )tanh * * sigmoid * *f f g gW V W V= + +z x h x h! 1 0 [ , ] [ ] [ ] M k y d n h k x n d k - = = - ×å [Basic WaveNet architecture] : input of activation : conditional vector : output of activation x h z
  • 13. SOFTMAX-WAVENET Use WaveNet as multinomial logistic regression model [4] • 𝜇-law companding for evenly distributed speech sample • 8-bit one-hot encoding • 256 symbols Estimate each symbol using WaveNet • Predict the sample by SoftMax distribution Optimize network by cross-entropy (CE) loss • Minimize the probabilistic distance between 𝑝& and 𝑞& 13 ( )ln 1 ( ) , =255 ln(1 ) x y sign x µ µ µ + = × + 8 ( )bitp OneHot y= exp( ) exp( ) q n n q i i z q z = å [ ]logn n n L p q= -å ( | )q nWaveNet <=z q h [Distribution of speech samples] [SoftMax-WaveNet]
  • 14. SOFTMAX-WAVENET Limitation by the usage of 8-bit quantization • Noisy synthetic speech due to insufficient number of quantization bits Intuitive solution • Expand the SoftMax dimension to 65,536 corresponds to 16-bit quantization à High computational cost & difficult to train Mixture density network (MDN)-based solution [5] • Train the WaveNet to predict the parameter of pre-defined speech distribution 14
  • 15. MDN-WAVENET Define the distribution of waveform sample as parameterized form [5] • Discretized mixture of logistic (MoL) distribution • Discretized logistic mixture with 16bit quantization (Δ = 1/2-. ) Estimate mixture parameters by WaveNet Optimize network by negative log-likelihood (NLL) loss 15 : logistic functions [ ]log ( | , )n n n L p x x<= -å h 1 / 2 / 2 ( ) N n n n n n n x x p x s s µ µ p s s = é ùæ ö æ ö+ D - - D - = × -ê úç ÷ ç ÷ è ø è øë û å [ , , ] ( , )s nWaveNetp µ <=z z z x h softmax( ) exp( )s p µ = = = π z µ z s z , for unity-summed mixture gain , for positive value of mixture scale Mixture density network [MDN-WaveNet]
  • 16. MDN-WAVENET Higher quality than SoftMax-WaveNet • Enable to model the speech signal by 16-bit • Overcome difficulty of spectrum modeling Difficulty of WaveNet training • Due to increased quantization bit from 8-bit to 16-bit Solution based on the human speech production model [6] • Model the vocal source signal, whose physical behavior is much simpler than the speech signal 16 Mixture density network [Spectrogram comparison] [Loss comparison] SoftMax WaveNet MDN WaveNet Recorded
  • 17. SPEECH PRODUCTION MODEL Concept • Modeling the speech as the filtered output of vocal source signal to vocal tract filter Methodology: Linear prediction (LP) approach • Define speech signal as linear combination of past speech samples 17 [ ]( ) ( ) ( ) ( ) ( )S z G z V z R z E z= × × × [Speech production model] [Spectral deconvolution by LP analysis] 1 P n i n i n i s s ea - = = +å 1 1 ( ) ( ) ( ), where ( ) 1 p i ii S z H z E z H z za - = = × = - å Spectrum part = LP coefficients Excitation part = Residual signal of LP analysis Speech = [vocal fold × vocal tract × lip radiation] × excitation
  • 18. EXCITNET Model the excitation signal by WaveNet, instead of speech signal [6] Training stage • Extract excitation signal by linear prediction (LP) analysis • Periodically updated filter coefficients matched to frame rate of acoustic feature • Train WaveNet to model excitation signal Synthesis stage • Generated excitation signal by WaveNet • Synthesize speech signal by LP synthesis filtering 18 1 P n n i n i i e s sa - = = - å 1 p n i n i n i s s ea - = = +å [ExcitNet] Recorded Excitation LP synthesis
  • 19. EXCITNET High quality of synthesized speech even though 8-bit quantization is used • Overcome the difficulty of spectrum modeling by using external spectral shaping filter Limitation • Independent modeling of excitation and spectrum parts results in unpredictable prediction error 19 1 1 ( ) ( ) 1 p i i i S z E z za - = = × - å Contains error Contains error Objective Represent LP synthesis process during WaveNet training / generation Recorded SoftMax WaveNet ExcitNet
  • 20. LP-WAVENET Motivation from the assumption of WaveNet vocoder 1. Previous speech samples, 𝐱3&, are given 2. LP coefficients, {𝛼6}, are given Probabilistic analysis • Difference between the random variables 𝑋& and 𝐸& is only a constant value of 𝑥;& Assume the discretized MoL distributed speech • Shifting property of 2nd order random variable 20 1 ˆ p n i n i i x xa - = = åTheir linear combination, , are also given ( | , )n np e <x h ˆnx ( | , )n np x <x h [Difference of distributions between speech and excitation] ˆ x e i i x e i i n x e i i x s s p p µ µ = = + = Only mean parameters are different Linear prediction ˆ| ( , ) ( ) | ( , ) ˆ| ( , ) n n n n n n n n X E x E x < < < = + = + x h x h x h ˆn n nx e x= +
  • 21. LP-WAVENET Utilize the causality of WaveNet and the linearity of LP synthesis processes [7] 21 ( ), , ,s nWaveNetp µ < é ù =ë ûz z z x h 1 ( | , ) / 2 / 2 n n N i i i i i i p x x x s s µ µ p s s < = = é ùæ ö æ ö+ D - - D - × -ê úç ÷ ç ÷ è ø è øë û å x h 1. Mixture parameter prediction 2. Mixture parameter refinement 3. Discretized MoL loss calculation [ ]log ( | , )n n n E p x <= -å x h softmax( ) ˆ exp( ) n s x p µ = = + = π z µ z s z [LP-WaveNet] Linear prediction Recorded ExcitNet LP-WaveNet
  • 22. TUNING OF WAVENET 1. Solution to waveform divergence problem 2. Waveform generation methods
  • 23. WAVEFORM DIVERGENCE PROBLEM Waveform divergence problem • Waveform divergence by a stacked error during WaveNet’s autoregressive generation Cause – Overfitting on the silence region • Unique solution in silence region • Easier to be happen when the portion of silence region in training set is larger • Be sensitive to tiny error in silence region during waveform generation Solution – Noise injection • Inject negligible amount of noise • Allowing only 1 bit error • Increase robustness to the prediction error in silence region 23 ( ),curr prevx WaveNet= x h ( )0 , silWaveNet= 0 h 16 ˆ , where = 2 2e e= + ×x x n 0 0 0 1 1 1 2 2 2 ~ ( | ) ~ ( | ) ~ ( | ) ~ ( | )n n n x p x x p x x p x x p x < < < < x x x x ! 0err 0 1err err+ i i errå ! [Error stacking during waveform generation]
  • 24. WAVEFORM GENERATION METHODS Random sampling Argmax sampling Greedy sampling [8] Mode sampling [7] • Sharper distribution on voiced region 24 ˆ ( ) ~ ( ( ) | ( ), )randx n p x n x k n< h max ˆ ˆ ˆ( ) ( ) (1 ) ( )greedy randx n vuv x n vuv x n= × + - × max ( ) ˆ ( ) arg max ( ( ) | ( ), ) x n x n p x n x k n= < h , ˆ ˆ ˆ( ) ( ) (1 ) ( )mode rand nar randx n vuv x n vuv x n= × + - × 1 ˆ ( ) ~ ( , ) I rand i i i i x n DistLogic sp µ = å , 1 ˆ ( ) ~ , I i rand narr i i i s x n DistLogic c p µ = æ ö ç ÷ è ø å [Random & argmax sampling] [Scale parameter control in mode sampling] randx maxx Random sampling Recorded Argmax sampling Greedy sampling Mode sampling
  • 26. EXPERIMENTS Network architecture Systems • WN<: MDN-WaveNet that models the speech signal • WN=: MDN-WaveNet that models the excitation signal • WN>?: Proposed LP-WaveNet 26 Database Korean male: MBC YBS database Training / validation / Test 2,500 (~3.2h) / 200 / 200 Minibatch size 4 GPU x 20000 samples Dilation 3 * [1, 2, 4, 8, 16, 32, 64, 128, 256, 512] Layer 30 Receptive field 3070 samples Residual chn. 128 Skip chn. 128 Quantization 65536 uniform Number of mixture 10 mixture components --> 30 output channels Conditioning features Voicing flag, LSF, F0 (log), Eng (log), BAP Normalization Gaussian normalization Learning rate 1.00E-04 Initialization / optimizer Xavier / Adam Sample generation method Mode sampling
  • 28. OBJECTIVE EVALUATION Performance measurements • VUV: Voicing error rate (%) • F0 RMSE: F0 root mean square error (Hz) in voiced region • LSD: Log-spectral distance of AR spectrum (dB) • F-LSD: LSD of synthesized speech in frequency domain (dB) Results 28 A/S: analysis / synthesis SPSS: synthesis by predicted acoustic features
  • 29. SUBJECTIVE EVALUATION Mean opinion score (MOS) test • 20 random synthesized utterances from test set • 12 native Korean speakers • Include STRAIGHT (STR)-based speech synthesis system as baseline [9] Results 29 A/S: analysis / synthesis SPSS: synthesis by predicted acoustic features
  • 30. SAMPLES Recorded STRAIGHT – A/S WN< – A/S WN= – A/S WN>? – A/S 30 STRAIGHT – SPSS WN< – SPSS WN= – SPSS WN>? – SPSS A/S: analysis / synthesis SPSS: synthesis by predicted acoustic features
  • 31. SUMMARY & CONCLUSION Investigated various types of WaveNet vocoder • SoftMax-WaveNet • MDN-WaveNet • ExcitNet Proposed an LP-WaveNet • Explicitly represent linear prediction structure of speech waveform into WaveNet framework Introduced tuning methods of WaveNet training / generation • Noise injection • Sample generation methods Performed experiment in objective and subjective manner • Achieve faster convergence speed than conventional WaveNet vocoders, whereas keep similar speech quality 31
  • 32. REFERENCES [1] A. J. Hunt and A. W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP, 1996 [2] H. Zen, et. al. “Statistical parametric speech synthesis using deep neural networks,” in Proc. ICASSP, 2013. [3] A. van den Oord, et. al.,“WaveNet: A generative model for raw audio generation,” in CoRR, 2016. [4] A.Tamamori, et. al., “Speaker-dependent wavenet vocoder,” in INTERSPEECH, 2017. [5] A. van den Oord, et. al., "Parallel WaveNet: Fast High-fidelity speech synthesis," in CoRR, 2017 [6] E. Song, K. Byun, and H.-G. Kang, “A neural excitation vocoder for statistical parametric speech synthesis systems,” in Arxiv, 2018. [7] M. Hwang, F. Soong, F. Xie, X. Wang, and H. Kang, “LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis,” Submitted to Proc. ICASSP, 2019. [8] W. Xin, et. al., “A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis,” in Proc. ICASSP, 2018. [9] H. Kawahara, “STRAIGHT, exploration of the other aspect of vocoder: Perceptually isomorphic decomposition of speech sounds,” Acoustical Science and Technology, 2006. 32