Generative adversarial network (GAN) is a new idea for training models, in which a generator and a discriminator compete against each other to improve the generation quality. Recently, GAN has shown amazing results in image generation, and a large amount and a wide variety of new ideas, techniques, and applications have been developed based on it. Although there are only few successful cases, GAN has great potential to be applied to text and speech generations to overcome limitations in the conventional methods.
The tutorial includes three parts. The first part provides a thorough review of GAN. We will first introduce GAN to newcomers. Then, we will introduce the approaches that aim to improve the training procedure and the variants of GAN beyond simply generating random objects. The second part of this tutorial will focus on the applications of GAN on speech. Although most techniques related to GAN are developed on image generation today, GAN can also generate speech. However, speech signals are temporal sequences which have very different nature from images. We will describe how to apply GAN on speech signal processing, including text-to-speech synthesis, voice conversion, speech enhancement, and domain adversarial training on speech-related tasks. The third part of this tutorial will focus on the applications of GAN on natural language processing. The major challenge for applying GAN on natural language is its discrete nature (words are usually represented by one-hot encodings), which makes the original GAN fails. We will review a series of approaches dealing with this problem, and finally demonstrate the applications of GAN on chat-bot, abstractive summarization, and text style transformation.