SlideShare une entreprise Scribd logo
1  sur  30
Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu
Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2
What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3
TTS Applications
Applications:
 Services for the visually impaired community
 Services for the Illiterate people with difficulties in reading
 Enable use of Computers and IT services
 Reading email aloud
 Using Word processor
 Using Internet
Commercial TTS Systems:
 Festival
 Bell Labs TTS
4
How TTS Works
5
Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6
Different TTS Systems (cont’d)
Diphone-Based TTS System:
 Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7
Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8
Word to Diphone Converter
(Phonetization)
 Purpose
 Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
 mark the text into prosodic units such as phrases,
clauses and sentences
 Resource
– Dictionary of words and their diphones
9
Prosody
Diphone
Retrieval
ConcatenationAcoustic
Manipulation
Diphone
Database
Prosody
Param.
10
Properties of Speech
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
11
Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch,
duration and amplitude at the concatenation
point is very important.
12
Altering Pitch
Hanning
window
Original diphone Extracted
pitch period
Hanned
pitch period
X
=
13
PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14
Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15
Altering Amplitude
 Multiplying the signal by a constant
 If constant > 1, amplitude increase
 If constant < 1, amplitude decrease
16
Concatenation
Diphones  Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words  Sentence
• Straight joining at the end points due to
presence of pauses
17
Putting All Together
TTS System
Text
Pre-processing Prosody Concatenation
words
18
Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19
Progress of Bangla TTS
• KATHA
 Developed in BRAC university
 Unit based system using Festival framework
 4355 Diphones
 Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
 syllable based synthesis system
 Developed in Kolkata
• SUBACHAN
 Developed by SUST people
 Diphone based synthesis system
 527 Diphones
 Takes 45ms to generate a 10 sec utterance
20
Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21
Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22
Solution: Homograph Disambiguation
 Collect allpossible homograph words
 Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23
Problems: Improper Concatenation
24
Not concatenated
properly
Signal from the the
utterance of রাশেদ
Solution: Improper Concatenation
• PSOLA
• Reducing number of concatenation point
– Ex 1. Sentence-> কামাে ভাে বছলে।
Diphones-> কা + আমা + আে ভা+আলো বছ+এলে
Instead of ক + কআ +আম + মআ +আে + ে …
– Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী
• Vowel sound is periodic, thus suitable for
appropriate concatenation
• Use 1000 most frequently spoken word
25
Duration Modeling
26
Duration Modeling
27
Thank you all!
Suggestions??
28
Sound Synthesized by Katha
• Katha
29
Sound Synthesized by Subachan
• Subachan
30

Contenu connexe

En vedette

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Sonali Jannat
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentationshahinmehr
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksSJones87
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaArabicOntology
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaMuhammad Haroon
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
BIODERMA
BIODERMABIODERMA
BIODERMAIeva_S
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseemDr. Aseem Sharma
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognitionMark Williams
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NETMandeep Cheema
 

En vedette (17)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentation
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinks
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
 
Mp3englishreview
Mp3englishreviewMp3englishreview
Mp3englishreview
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and Kannada
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
BIODERMA
BIODERMABIODERMA
BIODERMA
 
Bangla OCR
Bangla OCRBangla OCR
Bangla OCR
 
парки легені міст і сіл
парки   легені міст і сілпарки   легені міст і сіл
парки легені міст і сіл
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseem
 
Physics (NSC013)
Physics (NSC013)Physics (NSC013)
Physics (NSC013)
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
General principles of drug action
General principles of drug actionGeneral principles of drug action
General principles of drug action
 

Similaire à Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能Amazon Web Services
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Amazon Web Services
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)Amazon Web Services
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!台灣資料科學年會
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...ravi sharma
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalcaptainmactavish1996
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 

Similaire à Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman (20)

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
G1803013542
G1803013542G1803013542
G1803013542
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Translation
TranslationTranslation
Translation
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Dernier

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 

Dernier (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

  • 1. Progress on Bangla Text-To-Speech System Presented By: Dr. M. Shahidur Rahman Professor, Dept. of Computer Science & Engg. Shahjalal University of Science & Technology rahmanms@sust.edu
  • 2. Outline • Introduction to TTS • How TTS works • Present Bangla TTS systems • Problems of the present Bangla TTS • Directions to improve the performance of Bangla TTS • Discussion… 2
  • 3. What is a TTS? • The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together isolated words – Instead, TTS employs linguistic analysis to infer correct pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP) 3
  • 4. TTS Applications Applications:  Services for the visually impaired community  Services for the Illiterate people with difficulties in reading  Enable use of Computers and IT services  Reading email aloud  Using Word processor  Using Internet Commercial TTS Systems:  Festival  Bell Labs TTS 4
  • 6. Different TTS Systems Phoneme-Based TTS System • Phonemes are: – The minimal distinctive phonetic units – Relatively small in number (39 phonemes in English) • Disadvantage – Phonemes ignore transitional sound !!! 6
  • 7. Different TTS Systems (cont’d) Diphone-Based TTS System:  Diphones are: – Made up of 2 phonemes – Incorporate transitional sound – Produce better sounding speech – Ex. কক = ক + কঅ + অক + ক Disadvantage: • Over 1500 diphones in English language !!! 7
  • 8. Text Pre-Processing • Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words 8
  • 9. Word to Diphone Converter (Phonetization)  Purpose  Translate words to their diphone representations (Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})  mark the text into prosodic units such as phrases, clauses and sentences  Resource – Dictionary of words and their diphones 9
  • 12. Altering Pitch/Duration/Amplitude • For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important. 12
  • 13. Altering Pitch Hanning window Original diphone Extracted pitch period Hanned pitch period X = 13
  • 14. PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50% 14
  • 15. Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration 15
  • 16. Altering Amplitude  Multiplying the signal by a constant  If constant > 1, amplitude increase  If constant < 1, amplitude decrease 16
  • 17. Concatenation Diphones  Word • Using PSOLA at the joining ends • Ensures smooth transition Words  Sentence • Straight joining at the end points due to presence of pauses 17
  • 18. Putting All Together TTS System Text Pre-processing Prosody Concatenation words 18
  • 19. Types of Concatenative speech synthesis • Concatenative synthesis with a fixed inventory – contain one sample for each unit, and perform prosodic modification to match the required prosody • Unit-selection-based synthesis – store several instances of each unit, thus improving the chances of finding a well-matched unit 19
  • 20. Progress of Bangla TTS • KATHA  Developed in BRAC university  Unit based system using Festival framework  4355 Diphones  Takes 2 sec to generate a 10 sec utterance • BANGLA VAANI  syllable based synthesis system  Developed in Kolkata • SUBACHAN  Developed by SUST people  Diphone based synthesis system  527 Diphones  Takes 45ms to generate a 10 sec utterance 20
  • 21. Speech Signal From Kotha and Subachan • (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ- তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of kotha) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব • (Voice of Subachan) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব 21
  • 22. Problems: Homograph Ambiguity • Homographs are words that share the same spelling but differ in meaning and pronunciation 22
  • 23. Solution: Homograph Disambiguation  Collect allpossible homograph words  Determine POS tag of the homograph words Ex. বছলেরামালেিে (bol) বেেলছ। িু তম যালি তক িা িে (bolo)। • Bayes Theorem can also be applied to determine the likelihood of a word. 23
  • 24. Problems: Improper Concatenation 24 Not concatenated properly Signal from the the utterance of রাশেদ
  • 25. Solution: Improper Concatenation • PSOLA • Reducing number of concatenation point – Ex 1. Sentence-> কামাে ভাে বছলে। Diphones-> কা + আমা + আে ভা+আলো বছ+এলে Instead of ক + কআ +আম + মআ +আে + ে … – Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী • Vowel sound is periodic, thus suitable for appropriate concatenation • Use 1000 most frequently spoken word 25
  • 29. Sound Synthesized by Katha • Katha 29
  • 30. Sound Synthesized by Subachan • Subachan 30