SlideShare une entreprise Scribd logo
1  sur  59
10:02 10:02
Text-to-Speech System for
Gujarati
Project Presentation by Samyak Bhuta
10:02 10:02
* PROJECT PROFILE *
Objective : Developing a Text-to-Speech
System for Gujarati
10:02 10:02
* PROJECT PROFILE *
Under the guidance of
 Prof. Ram Mohan
 Shri Jignesh Dholakia
10:02 10:02
* PROJECT PROFILE *
At Resorce Centre for Indian Language
Technology Solutions in Gujarati,
Faculty of Arts,
The M. S. University of Baroda, BARODA.
10:02 10:02
Next 25 minutes …
> Sound and Speech Sound
> ABC of TTS Systems
> Pilot Project
> GTTS from scratch
> Speech , Syllable and Partneme
> Speech Sounds in detail
> Core Engine
> Language Dependent Components
10:02 10:02
Sound : a flow of air
Source EarAir flows
Sound
♫
♪
♫
10:02 10:02
What makes different sounds ?
 The factors, responsible for perceptual
difference between one kind of sound from
the another are
 Amplitude (or volume) which tells how much
power the air-flow holds within
 Frequency (or pitch) which tells at what rate
the air-flow is repeating itself
10:02 10:02
The “Source” doesn’t matters
 An air-flow of kind A will sound same
weather it has generated from source X
or source Y.
10:02 10:02
Speech Sound
 A kind of sound whose source is
Human Vocal Organism and who
finds its place in human speech.
 e.g. ક્ , સ્ , અ , ઈ
 A standard called International Phonetic
Alphabet (IPA) is used to depict such sounds
10:02 10:02
IPA
 IPA comprises almost all the speech sounds
of all languages in the world.
 Speech sounds are more formally known as
Phones
 IPA uses set of symbols to represent them
e.g. k , s , ə , i , ʤ
 IPA Chart …
10:02 10:02
IPA Chart
10:02 10:02
Synthesized Speech Sound
 If we can produce the same pattern of
air-flow as it is produced by Human Vocal
Organism, representing a speech sound,
we can say that we have synthesized the
speech sound
10:02 10:02
Speech Synthesizer
 A mechanism which is capable of producing
synthesized speech sound in controlled
manner.
10:02 10:02
Text-to-Speech Systems
 A Speech Synthesizer which is smart enough
to produce equivalent Speech output of the
given text.
 The smartness accounts for making the
output as natural and intelligible as
possible.
10:02 10:02
Text-to-Speech Systems
 Usually, the TTS Systems are specific to
only one human language and takes input
text from only that language
10:02 10:02
Basic structure of TTS Systems
 Function of any TTS System is, generally,
divided into three subtasks or phases.
I. Preprocessing
II. Phonetic-Prosodic Translation
III. Speech Production
 The text input travels through these
phases, one by one, and eventually ends
up in a speech .
10:02 10:02
Preprocessing
 “Dr. Ajay Shah will come to clinic on 23 ,Jan.”
 We read it …
“DOCTOR Ajay Shah will come to clinic on
TWENTY THIRD OF JANUARY”.
 The Preprocessing is meant to convert
the input text, from raw condition, to
pronounceable word text.
10:02 10:02
Phonetic-Prosodic Translation
 This phase can be logically divided into two
different phases,
• Phonetic Translation
• Prosodic Translation
 Real TTS Systems may implement these
phases separately or as a unit but together
they provide data for the next phase of TTS.
10:02 10:02
Phonetic Translation
 In human languages, the script under use
doesn’t necessarily posses the one to one
mapping with speech.
 e.g.
enough is pronounced as INAF / inəf IPA
છોકરો is pronounced as છોક્રો / okʧ ɾo IPA
10:02 10:02
Phonetic Translation
 A Phonetic Translation is used to provide
information, to the next phase, about exactly
what kind of speech sounds (phones) to be
produced for the given text.
 Phonetic Translation is also regarded as
Letter-to-Sound rules.
10:02 10:02
Prosodic Translation
 Mapping from letter-to-sound rules only
provides information about kind of speech
sound to be generated. To convey the
emotions and expressions residing in the
input text , Prosody needs to be applied.
 By Prosody we mean,
Amplitude + Pitch + Duration
10:02 10:02
Speech Production
 This phase is responsible for actual output
of the speech.
 The phase uses the phonetic and prosodic
information provided from the previous
phase.
 Various approaches exist for production of
speech.
10:02 10:02
Different ways for Speech Production
 Three widely used approaches for speech
production are
• Articulatory Synthesis
• Source-Filter Synthesis
• Concatenative Synthesis
 Speech production part of the TTS System
is generally regarded as speech engine.

10:02 10:02
Usecases
 As we understood the structure of the TTS
Systems we realized that all three phases is
required in order to develop complete TTS
for Gujarati.
 At the top most abstraction level a use case
can be conceived for fulfilling the requirement
of having a TTS System for Gujarati.
10:02 10:02
Usecases
 The topmost use case, then, can be divided
into three further use cases each fulfilling
the requirement of three different phases
 During the project we tried to realize each
use case one by one.
10:02 10:02
Pilot Project
 As we approached various requirements
and usecases to be realized, we found that
developing a Preprocessor is not so much
significant as developing the other two
phases. So we decided to develop later on.
 We decided to develop Phonetic-Prosodic
Translation phase first as if it can be easily
plugged into any already build ….speech
10:02 10:02
Pilot Project
… speech engine who takes input in terms of
of IPA.
 FreeTTS, IBMJS, Dhvani, Narad were
studied
 We used Java Speech API along with IBMJS
as a speech engine to be used.
 The input to the engine was provided through
Java Speech Markup Language (JSML)
10:02 10:02
Pilot Project : Objective
 To develop a TTS System using already
available Speech Engine and supplying
transcripted (equivalent ) IPA text of target
Gujarati Unicode text to the engine.
10:02 10:02
Pilot Project : S/W Requirement
 A Speech Engine Component which takes
IPA and speaks it out .
10:02 10:02
Pilot Project : Design
 No of usecases were conceived and its
implementation was provided as different
java classes.
10:02 10:02
Pilot Project : Conclusion
 We cannot continue developing a TTS
System with “outsider” speech engine as
the accent and other things need to be
Gujarati in nature.
10:02 10:02
Starting of GTTS from Scratch
 From the result of the Pilot Project we
concluded that it is required to develop the
Speech Engine keeping Gujarati in mind.
 Concatenative approach was to be used
since it provides naturalness and has proven
track record.
10:02 10:02
Concatenation
 In Concatenative approach, already stored
segments of sounds are joined together to
produce the complete speech.
 Such segments are known as concatenation
unit.
 We used Partnemes as our concatenation
unit.
10:02 10:02
Partnemes
 Partneme is a very small segment of sound
whose typical length ranges from 8 ms to
100 ms. We get the partnemes by cutting
the recorded speech.
 But before understanding what is partneme
we have to understand human speech in
greater detail. Especially the relation
between speech and syllable.
10:02 10:02
How we speak ?
 At time of normal breathing the period we
devote to breath-in is longer than that of
breath-out in a complete breath cycle.
 But when we start speaking, the breath-in
period becomes shorter paving the way for
a longer breath-out period.
 It is so because to speak out (anything) we
need some air-flow. We use the air-flow …
10:02 10:02
How we speak ? : Human Vocal Tract
… powered by lungs, during breath-out.
 This air-flow is modified at various points
of Human Vocal Tract, ending up in a one
or another kind of speech sound (phones).
 Human Vocal Tract comprises of various
organs which, in one or another way,
changes the air-flow.
 Human Vocal Tract …
10:02 10:02
HumanVocalTract
10:02 10:02
10:02 10:02
How we speak ? : Syllable and Speech
 During the one complete breath cycle
we can speak out more than one phones.
 These all phones, spoken out in just one
breath cycle, constitutes a syllable .
 Sequence of such syllables in their
continuity forms a speech.
10:02 10:02
How we speak ? : Syllable Structure
 It is important to know the structure of
syllable in order to understand partnemes.
 Typically a syllable is made up of vowel as a
nucleus with consonants around it.
 Gujarati employees the following syllable
structure.
< C + C + C + V + V̯ + C + C >
10:02 10:02
How we speak ? : Syllable Structure
 < C + C + C + V + V̯ + C + C >
where C - consonants
V - vowel
V̯ - unsyllablized vowel
 An utterance ( spoken word ) is made up
series of such syllables.
10:02 10:02
How we speak ? : Syllable Structure
 રામ - ɾam is made up of single syllable.
here the structure becomes
< ɾC
+ aV
+ mC
> .
 પત્ર - pətɾ is also made up of single syllable.
here the structure becomes
< pC
+ əV
+ tC
+ ɾC
>
 લશ્કર - ləʃkəɾ is made up of two syllables.
here the structure becomes
< lC
+ əV
+ ʃC
> < kC
+ əV
+ ɾC
>
10:02 10:02
How we speak ? : Consonants and Vowels
 Consonants and vowels are two different
kind of speech sounds with different
acoustic parameters.
 To know the exact difference between
consonants and vowels we have to
understand how the single vocal tract is
capable of producing so many different
sounds.
10:02 10:02
How we speak ? : Articulation
 Modification of the air-flow is achieved by
articulation of various speech organs of the
vocal tract.
 The exact nature of speech sound that will
come up during the breath-out is determined
by
1 Place of Articulation
2 Manner of Articulation
10:02 10:02
How we speak ? : Place of articulation
 Place of articulation refers to the exact point,
in human vocal tract, where articulation
happened.
e.g. [p] - two lips
[k] - back of tongue with velum
[ ] - tip of tongue with alveolarɾ
10:02 10:02
How we speak ? : Manner of articulation
 Manner of articulation refers to the degree
of constriction made, during the articulation.
e.g. [p] - stop or plosive
[ ] - affricateʧ
[ ] - tappedɾ
[ j ] - glide
[ o ] - vowel ( no constriction )
10:02 10:02
How we speak ? : Voicedness
 If, during the traveling of the air-flow from the
glottis, vocal cords are vibrating (and thus
changing the air-flow) we get a voiced
sound.
e.g. [g] - voiced
[k] - unvoiced
10:02 10:02
How we speak ? : Aspiration
 Aspiration refers to the state of vocal cords,
during the final stage of process, when
speaking out phones. When we speak out
aspirated phones the vocal cords
approaches, itself to vibrating state, as
time goes ( irrespective of their voicednees ).
e.g. [k ] - aspiratedʰ
[ k ] - unaspirated
10:02 10:02
Segmentation and Partneme
 Segmentation of partnemes is achieved by
separating the recorded syllable.
 Given is sound wave form for ગમન build with
partnemes. Red lines mark the separation.
10:02 10:02
Partnemes
 As shown syallable is logically divided into
 null sound to consonant transition
 core consonant
 consonant to vowel transition
 core vowel
 vowel to consonant transition
 core consonant
 consonant to null sound transition
10:02 10:02
Partnemes
 If we can provide the partnemes for each
vowel and consonant we can join them
accordingly to produce any complete syllable
and hence any utterance.
e.g.
કરણ - kə əɾ ɳ
0_k;k;k_ə;ə;ə_ɾ;ɾ;ɾ_ə;ə;ə_ɳ;ɳ;ɳ_0
10:02 10:02
ભારત - b aʰ əɾ t
0_b ;b ;b _a;a;a_ʰ ʰ ʰ ɾ;ɾ;ɾ_ə;ə;ə_t;t;t_0
10:02 10:02
Core Engine
 The speech engine, we developed to
concatenate such partneme sequence
based on given IPA, uses pair of files.
 One, called Voice File , contains the audio
data of all the partnemes.
 The other serves as a reference to the
Voice File and is called Voice Info File .
It contains the place and length of
partnemes in the Voice File .
10:02 10:02
Core Engine
 The Core Engine realizes the usecase for
having a speech engine.
10:02 10:02
Language Dependent Components
 Since Core Engine only understands IPA
sequence we have to provide a component
which translate the Gujarati text to IPA
sequence .
 The Preprocessing capabilities need also
be developed for a complete TTS System.
 Unlike Core Engine, both aforementioned
components would be specific to particular
language and …
10:02 10:02
Language Dependent Components
… therefore kept aside as language dependent
components.
 Preprocessor :
As preprocessing should be highly
customizable from the end user end we
have provided a text file which can be
edited to control the functionality of the
preprocessor.
10:02 10:02
 IPATranscriptor : This component currently
provides only phonetic translation of the given
Gujarati text as complete rules for prosodic
translation are not available.
10:02 10:02
Thanks
 Prof. Bhartiben Modi
 Mr. Ajay Sarvaiya
 Mr. Irshad Shaikh
 Mr. Mihir Trivedi
10:02 10:02
Sloka
બુદ્ધિ વદ્ધિ વડે અર્થોનુદ્ધં ગ્રહણ કરી, આત્મા મનને ઉચ્ચારણની ઇચ્છા સાથે યોજે
છે. મન કાયાિ વગ્નને પ્રજ્વિ વલિત કરે છે, અર્ને તે (કાયાિ વગ્ન ) પ્રાણવાયુદ્ધને પ્રેરે છે.
તે પ્રેિરત વાયુદ્ધ, મૂર્ધિાર્ધા ( શીષ ર્ધા ) સાથે અર્િ વભઘાત પામીને, મુદ્ધખને પ્રાપ્ત કરીને,
તે તે સ્થાનોમાંથી પસાર થતાં, સ્વર, કાળ , સ્થાન , બાહ્ય અર્ને આભ્યંતર
પ્રયત્નોના અર્નુદ્ધપ્રદાનથી પાંચ પ્રકારના વણોનો પ્રાદુદ્ધભાર્ધાવ કરે છે.
- પાિ વણનીય િ વશક્ષા, દસમો અર્ધ્યાય, કાિરકા ૬, ૯ .

Contenu connexe

Tendances

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voiceNsaroj kumar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionfathitarek
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxNsaroj kumar
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingiosrjce
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionManthan Gandhi
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overviewsajanazoya
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processingazhagujaisudhan
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechNgwe Tun
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentationshahinmehr
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 

Tendances (20)

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voice
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Introduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-SpeechIntroduction to myanmar Text-To-Speech
Introduction to myanmar Text-To-Speech
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentation
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 

En vedette

Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujaratiAshvin Nakum
 
Digital Tools for Language Development
Digital Tools for Language DevelopmentDigital Tools for Language Development
Digital Tools for Language DevelopmentNik Peachey
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NETMandeep Cheema
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognitionMark Williams
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...TAUS - The Language Data Network
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translationbehzad66
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOMsathiyaseelanm
 
The translator (session 3)
The translator (session 3)The translator (session 3)
The translator (session 3)Victormorses
 
Good presentation!
Good presentation!Good presentation!
Good presentation!Arry Arman
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Victor de Boer
 
Gesture recognition techniques
Gesture  recognition techniques Gesture  recognition techniques
Gesture recognition techniques Akhil Garg
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 
IT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaIT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaArry Arman
 

En vedette (20)

Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujarati
 
Digital Tools for Language Development
Digital Tools for Language DevelopmentDigital Tools for Language Development
Digital Tools for Language Development
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Nari tu narayani
Nari tu narayaniNari tu narayani
Nari tu narayani
 
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
Conversational Speech Translation - Challenges and Techniques, by Chris Wendt...
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
The translator (session 3)
The translator (session 3)The translator (session 3)
The translator (session 3)
 
Good presentation!
Good presentation!Good presentation!
Good presentation!
 
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
Rudy Marsman's thesis presentation slides: Speech synthesis based on a limite...
 
The Speaking Glove
The Speaking GloveThe Speaking Glove
The Speaking Glove
 
Gesture recognition techniques
Gesture  recognition techniques Gesture  recognition techniques
Gesture recognition techniques
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
IT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & MultimediaIT Introduction - 06. Graphic & Multimedia
IT Introduction - 06. Graphic & Multimedia
 

Similaire à Gujarati Text-to-Speech Presentation

SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)dineshkatta4
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET Journal
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
 
Voice Transmission - Echo Translation Demo
Voice Transmission - Echo Translation DemoVoice Transmission - Echo Translation Demo
Voice Transmission - Echo Translation DemoWatson IoT Recipes
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browserSowndaryaP
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Concatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelConcatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelAbdullah al Mamun
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptxMounika715343
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech RecognitionYogesh Vijay
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Kotaro Hara
 
Speech processinglecworkshop
Speech processinglecworkshopSpeech processinglecworkshop
Speech processinglecworkshopd_govind
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 

Similaire à Gujarati Text-to-Speech Presentation (20)

SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)SAP (SPEECH AND AUDIO PROCESSING)
SAP (SPEECH AND AUDIO PROCESSING)
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
 
Voice Transmission - Echo Translation Demo
Voice Transmission - Echo Translation DemoVoice Transmission - Echo Translation Demo
Voice Transmission - Echo Translation Demo
 
Voice based web browser
Voice based web browserVoice based web browser
Voice based web browser
 
Dhvani TTS
Dhvani TTSDhvani TTS
Dhvani TTS
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
An Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile EnvironmentAn Application for Performing Real Time Speech Translation in Mobile Environment
An Application for Performing Real Time Speech Translation in Mobile Environment
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Concatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer modelConcatenative bangla speech synthesizer model
Concatenative bangla speech synthesizer model
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
 
H010625862
H010625862H010625862
H010625862
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
Effect of Machine Translation in Interlingual Conversation: Lessons from a Fo...
 
Speech processinglecworkshop
Speech processinglecworkshopSpeech processinglecworkshop
Speech processinglecworkshop
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Dernier (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Gujarati Text-to-Speech Presentation

  • 1. 10:02 10:02 Text-to-Speech System for Gujarati Project Presentation by Samyak Bhuta
  • 2. 10:02 10:02 * PROJECT PROFILE * Objective : Developing a Text-to-Speech System for Gujarati
  • 3. 10:02 10:02 * PROJECT PROFILE * Under the guidance of  Prof. Ram Mohan  Shri Jignesh Dholakia
  • 4. 10:02 10:02 * PROJECT PROFILE * At Resorce Centre for Indian Language Technology Solutions in Gujarati, Faculty of Arts, The M. S. University of Baroda, BARODA.
  • 5. 10:02 10:02 Next 25 minutes … > Sound and Speech Sound > ABC of TTS Systems > Pilot Project > GTTS from scratch > Speech , Syllable and Partneme > Speech Sounds in detail > Core Engine > Language Dependent Components
  • 6. 10:02 10:02 Sound : a flow of air Source EarAir flows Sound ♫ ♪ ♫
  • 7. 10:02 10:02 What makes different sounds ?  The factors, responsible for perceptual difference between one kind of sound from the another are  Amplitude (or volume) which tells how much power the air-flow holds within  Frequency (or pitch) which tells at what rate the air-flow is repeating itself
  • 8. 10:02 10:02 The “Source” doesn’t matters  An air-flow of kind A will sound same weather it has generated from source X or source Y.
  • 9. 10:02 10:02 Speech Sound  A kind of sound whose source is Human Vocal Organism and who finds its place in human speech.  e.g. ક્ , સ્ , અ , ઈ  A standard called International Phonetic Alphabet (IPA) is used to depict such sounds
  • 10. 10:02 10:02 IPA  IPA comprises almost all the speech sounds of all languages in the world.  Speech sounds are more formally known as Phones  IPA uses set of symbols to represent them e.g. k , s , ə , i , ʤ  IPA Chart …
  • 12. 10:02 10:02 Synthesized Speech Sound  If we can produce the same pattern of air-flow as it is produced by Human Vocal Organism, representing a speech sound, we can say that we have synthesized the speech sound
  • 13. 10:02 10:02 Speech Synthesizer  A mechanism which is capable of producing synthesized speech sound in controlled manner.
  • 14. 10:02 10:02 Text-to-Speech Systems  A Speech Synthesizer which is smart enough to produce equivalent Speech output of the given text.  The smartness accounts for making the output as natural and intelligible as possible.
  • 15. 10:02 10:02 Text-to-Speech Systems  Usually, the TTS Systems are specific to only one human language and takes input text from only that language
  • 16. 10:02 10:02 Basic structure of TTS Systems  Function of any TTS System is, generally, divided into three subtasks or phases. I. Preprocessing II. Phonetic-Prosodic Translation III. Speech Production  The text input travels through these phases, one by one, and eventually ends up in a speech .
  • 17. 10:02 10:02 Preprocessing  “Dr. Ajay Shah will come to clinic on 23 ,Jan.”  We read it … “DOCTOR Ajay Shah will come to clinic on TWENTY THIRD OF JANUARY”.  The Preprocessing is meant to convert the input text, from raw condition, to pronounceable word text.
  • 18. 10:02 10:02 Phonetic-Prosodic Translation  This phase can be logically divided into two different phases, • Phonetic Translation • Prosodic Translation  Real TTS Systems may implement these phases separately or as a unit but together they provide data for the next phase of TTS.
  • 19. 10:02 10:02 Phonetic Translation  In human languages, the script under use doesn’t necessarily posses the one to one mapping with speech.  e.g. enough is pronounced as INAF / inəf IPA છોકરો is pronounced as છોક્રો / okʧ ɾo IPA
  • 20. 10:02 10:02 Phonetic Translation  A Phonetic Translation is used to provide information, to the next phase, about exactly what kind of speech sounds (phones) to be produced for the given text.  Phonetic Translation is also regarded as Letter-to-Sound rules.
  • 21. 10:02 10:02 Prosodic Translation  Mapping from letter-to-sound rules only provides information about kind of speech sound to be generated. To convey the emotions and expressions residing in the input text , Prosody needs to be applied.  By Prosody we mean, Amplitude + Pitch + Duration
  • 22. 10:02 10:02 Speech Production  This phase is responsible for actual output of the speech.  The phase uses the phonetic and prosodic information provided from the previous phase.  Various approaches exist for production of speech.
  • 23. 10:02 10:02 Different ways for Speech Production  Three widely used approaches for speech production are • Articulatory Synthesis • Source-Filter Synthesis • Concatenative Synthesis  Speech production part of the TTS System is generally regarded as speech engine. 
  • 24. 10:02 10:02 Usecases  As we understood the structure of the TTS Systems we realized that all three phases is required in order to develop complete TTS for Gujarati.  At the top most abstraction level a use case can be conceived for fulfilling the requirement of having a TTS System for Gujarati.
  • 25. 10:02 10:02 Usecases  The topmost use case, then, can be divided into three further use cases each fulfilling the requirement of three different phases  During the project we tried to realize each use case one by one.
  • 26. 10:02 10:02 Pilot Project  As we approached various requirements and usecases to be realized, we found that developing a Preprocessor is not so much significant as developing the other two phases. So we decided to develop later on.  We decided to develop Phonetic-Prosodic Translation phase first as if it can be easily plugged into any already build ….speech
  • 27. 10:02 10:02 Pilot Project … speech engine who takes input in terms of of IPA.  FreeTTS, IBMJS, Dhvani, Narad were studied  We used Java Speech API along with IBMJS as a speech engine to be used.  The input to the engine was provided through Java Speech Markup Language (JSML)
  • 28. 10:02 10:02 Pilot Project : Objective  To develop a TTS System using already available Speech Engine and supplying transcripted (equivalent ) IPA text of target Gujarati Unicode text to the engine.
  • 29. 10:02 10:02 Pilot Project : S/W Requirement  A Speech Engine Component which takes IPA and speaks it out .
  • 30. 10:02 10:02 Pilot Project : Design  No of usecases were conceived and its implementation was provided as different java classes.
  • 31. 10:02 10:02 Pilot Project : Conclusion  We cannot continue developing a TTS System with “outsider” speech engine as the accent and other things need to be Gujarati in nature.
  • 32. 10:02 10:02 Starting of GTTS from Scratch  From the result of the Pilot Project we concluded that it is required to develop the Speech Engine keeping Gujarati in mind.  Concatenative approach was to be used since it provides naturalness and has proven track record.
  • 33. 10:02 10:02 Concatenation  In Concatenative approach, already stored segments of sounds are joined together to produce the complete speech.  Such segments are known as concatenation unit.  We used Partnemes as our concatenation unit.
  • 34. 10:02 10:02 Partnemes  Partneme is a very small segment of sound whose typical length ranges from 8 ms to 100 ms. We get the partnemes by cutting the recorded speech.  But before understanding what is partneme we have to understand human speech in greater detail. Especially the relation between speech and syllable.
  • 35. 10:02 10:02 How we speak ?  At time of normal breathing the period we devote to breath-in is longer than that of breath-out in a complete breath cycle.  But when we start speaking, the breath-in period becomes shorter paving the way for a longer breath-out period.  It is so because to speak out (anything) we need some air-flow. We use the air-flow …
  • 36. 10:02 10:02 How we speak ? : Human Vocal Tract … powered by lungs, during breath-out.  This air-flow is modified at various points of Human Vocal Tract, ending up in a one or another kind of speech sound (phones).  Human Vocal Tract comprises of various organs which, in one or another way, changes the air-flow.  Human Vocal Tract …
  • 39. 10:02 10:02 How we speak ? : Syllable and Speech  During the one complete breath cycle we can speak out more than one phones.  These all phones, spoken out in just one breath cycle, constitutes a syllable .  Sequence of such syllables in their continuity forms a speech.
  • 40. 10:02 10:02 How we speak ? : Syllable Structure  It is important to know the structure of syllable in order to understand partnemes.  Typically a syllable is made up of vowel as a nucleus with consonants around it.  Gujarati employees the following syllable structure. < C + C + C + V + V̯ + C + C >
  • 41. 10:02 10:02 How we speak ? : Syllable Structure  < C + C + C + V + V̯ + C + C > where C - consonants V - vowel V̯ - unsyllablized vowel  An utterance ( spoken word ) is made up series of such syllables.
  • 42. 10:02 10:02 How we speak ? : Syllable Structure  રામ - ɾam is made up of single syllable. here the structure becomes < ɾC + aV + mC > .  પત્ર - pətɾ is also made up of single syllable. here the structure becomes < pC + əV + tC + ɾC >  લશ્કર - ləʃkəɾ is made up of two syllables. here the structure becomes < lC + əV + ʃC > < kC + əV + ɾC >
  • 43. 10:02 10:02 How we speak ? : Consonants and Vowels  Consonants and vowels are two different kind of speech sounds with different acoustic parameters.  To know the exact difference between consonants and vowels we have to understand how the single vocal tract is capable of producing so many different sounds.
  • 44. 10:02 10:02 How we speak ? : Articulation  Modification of the air-flow is achieved by articulation of various speech organs of the vocal tract.  The exact nature of speech sound that will come up during the breath-out is determined by 1 Place of Articulation 2 Manner of Articulation
  • 45. 10:02 10:02 How we speak ? : Place of articulation  Place of articulation refers to the exact point, in human vocal tract, where articulation happened. e.g. [p] - two lips [k] - back of tongue with velum [ ] - tip of tongue with alveolarɾ
  • 46. 10:02 10:02 How we speak ? : Manner of articulation  Manner of articulation refers to the degree of constriction made, during the articulation. e.g. [p] - stop or plosive [ ] - affricateʧ [ ] - tappedɾ [ j ] - glide [ o ] - vowel ( no constriction )
  • 47. 10:02 10:02 How we speak ? : Voicedness  If, during the traveling of the air-flow from the glottis, vocal cords are vibrating (and thus changing the air-flow) we get a voiced sound. e.g. [g] - voiced [k] - unvoiced
  • 48. 10:02 10:02 How we speak ? : Aspiration  Aspiration refers to the state of vocal cords, during the final stage of process, when speaking out phones. When we speak out aspirated phones the vocal cords approaches, itself to vibrating state, as time goes ( irrespective of their voicednees ). e.g. [k ] - aspiratedʰ [ k ] - unaspirated
  • 49. 10:02 10:02 Segmentation and Partneme  Segmentation of partnemes is achieved by separating the recorded syllable.  Given is sound wave form for ગમન build with partnemes. Red lines mark the separation.
  • 50. 10:02 10:02 Partnemes  As shown syallable is logically divided into  null sound to consonant transition  core consonant  consonant to vowel transition  core vowel  vowel to consonant transition  core consonant  consonant to null sound transition
  • 51. 10:02 10:02 Partnemes  If we can provide the partnemes for each vowel and consonant we can join them accordingly to produce any complete syllable and hence any utterance. e.g. કરણ - kə əɾ ɳ 0_k;k;k_ə;ə;ə_ɾ;ɾ;ɾ_ə;ə;ə_ɳ;ɳ;ɳ_0
  • 52. 10:02 10:02 ભારત - b aʰ əɾ t 0_b ;b ;b _a;a;a_ʰ ʰ ʰ ɾ;ɾ;ɾ_ə;ə;ə_t;t;t_0
  • 53. 10:02 10:02 Core Engine  The speech engine, we developed to concatenate such partneme sequence based on given IPA, uses pair of files.  One, called Voice File , contains the audio data of all the partnemes.  The other serves as a reference to the Voice File and is called Voice Info File . It contains the place and length of partnemes in the Voice File .
  • 54. 10:02 10:02 Core Engine  The Core Engine realizes the usecase for having a speech engine.
  • 55. 10:02 10:02 Language Dependent Components  Since Core Engine only understands IPA sequence we have to provide a component which translate the Gujarati text to IPA sequence .  The Preprocessing capabilities need also be developed for a complete TTS System.  Unlike Core Engine, both aforementioned components would be specific to particular language and …
  • 56. 10:02 10:02 Language Dependent Components … therefore kept aside as language dependent components.  Preprocessor : As preprocessing should be highly customizable from the end user end we have provided a text file which can be edited to control the functionality of the preprocessor.
  • 57. 10:02 10:02  IPATranscriptor : This component currently provides only phonetic translation of the given Gujarati text as complete rules for prosodic translation are not available.
  • 58. 10:02 10:02 Thanks  Prof. Bhartiben Modi  Mr. Ajay Sarvaiya  Mr. Irshad Shaikh  Mr. Mihir Trivedi
  • 59. 10:02 10:02 Sloka બુદ્ધિ વદ્ધિ વડે અર્થોનુદ્ધં ગ્રહણ કરી, આત્મા મનને ઉચ્ચારણની ઇચ્છા સાથે યોજે છે. મન કાયાિ વગ્નને પ્રજ્વિ વલિત કરે છે, અર્ને તે (કાયાિ વગ્ન ) પ્રાણવાયુદ્ધને પ્રેરે છે. તે પ્રેિરત વાયુદ્ધ, મૂર્ધિાર્ધા ( શીષ ર્ધા ) સાથે અર્િ વભઘાત પામીને, મુદ્ધખને પ્રાપ્ત કરીને, તે તે સ્થાનોમાંથી પસાર થતાં, સ્વર, કાળ , સ્થાન , બાહ્ય અર્ને આભ્યંતર પ્રયત્નોના અર્નુદ્ધપ્રદાનથી પાંચ પ્રકારના વણોનો પ્રાદુદ્ધભાર્ધાવ કરે છે. - પાિ વણનીય િ વશક્ષા, દસમો અર્ધ્યાય, કાિરકા ૬, ૯ .