4.18.24 Movement Legacies, Reflection, and Review.pptx
Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman
1. Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu
2. Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2
3. What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3
4. TTS Applications
Applications:
Services for the visually impaired community
Services for the Illiterate people with difficulties in reading
Enable use of Computers and IT services
Reading email aloud
Using Word processor
Using Internet
Commercial TTS Systems:
Festival
Bell Labs TTS
4
6. Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6
7. Different TTS Systems (cont’d)
Diphone-Based TTS System:
Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7
8. Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8
9. Word to Diphone Converter
(Phonetization)
Purpose
Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
mark the text into prosodic units such as phrases,
clauses and sentences
Resource
– Dictionary of words and their diphones
9
14. PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14
15. Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15
16. Altering Amplitude
Multiplying the signal by a constant
If constant > 1, amplitude increase
If constant < 1, amplitude decrease
16
17. Concatenation
Diphones Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words Sentence
• Straight joining at the end points due to
presence of pauses
17
19. Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19
20. Progress of Bangla TTS
• KATHA
Developed in BRAC university
Unit based system using Festival framework
4355 Diphones
Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
syllable based synthesis system
Developed in Kolkata
• SUBACHAN
Developed by SUST people
Diphone based synthesis system
527 Diphones
Takes 45ms to generate a 10 sec utterance
20
21. Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21
22. Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22
23. Solution: Homograph Disambiguation
Collect allpossible homograph words
Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23