2. What is Text-To-Speech Engine?
A text-to-speech (TTS) system converts normal language text into speech.
Also known as speech synthesizer.
e.g Microsoft SAM.
"The quick brown fox jumps over the lazy dog 1,234,567,890 times. soi"
3. Overview of Text-To-Speech Engine
A text-to-speech system (or "engine") is composed of two parts: a front-end and a
back-end.
The front-end converts raw text containing symbols like numbers and
abbreviations into the equivalent of written-out words. This process is often called
text normalization, pre-processing, or tokenization.
The front-end then assigns phonetic transcriptions to each word, and divides and
marks the text into prosodic units, like phrases, clauses, and sentences.
The process of assigning phonetic transcriptions to words is called text-to-
phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and
prosody information together make up the symbolic linguistic representation that
is output by the front-end.
The back-end—often referred to as the synthesizer—then converts the symbolic
linguistic representation into sound. In certain systems, this part includes the
computation of the target prosody (pitch contour, phoneme durations),which is
then imposed on the output speech.
4. Features and Functions of
Text-to-Speech Software
Text-to-speech (TTS) software tools are similar in that they speak text on a
computer. However, they vary widely in their functionality.
Formatting text – Allows you to format digital text you create, download from
the Internet, or scan into your computer similar to a word processing
program.
Speaking what you type – Speaks text as you type to give you support in
writing. Within this function there may be the ability to set the level of
support, such as speaking words or speaking each letter and then the word.
5. Speaking the Text
Continuous reading – reads from where you choose to begin reading and stops
when it reaches the end of the text or you use a stop command
Incremental reading – reads an increment of text such as a word, sentence,
chunk/phrase, or paragraph and stops and waits for you to request another
increment of text read.
Highlighted text – reads just text you highlight with the cursor. Some TTS
programs read the document from a starting point until the users stops the
program. Other TTS programs only read text selected and highlighted by the
user.
Voices – TTS software can use one voice or allow you to choose from a
selection of male, female, and even foreign language voices
Reading Speed – you can choose to read faster or slower in precise words per
minute or in speed increments
6. Who will be using speech synthesis?
for users with visual disabilities. Screen readers not only read text files but also
give the user other audible navigation support such as reading the user
interface, indicating where the user’s cursor is on the screen, and indicating
when the user’s cursor has passed over a folder.
Text readers are commercial TTS software tools for users who read below grade
level because of a learning disability, English as a second language, a reading
disability, or low vision.
Stephen Hawking is one of the most famous people using speech synthesis to
communicate
Mobiles Phones can speak out incoming text such as SMS, E-Mail and notification
7. Advantage of Text-To-Speech.
listen to class notes, text books and electronic text.
Facilitates education
Avoids eyestrain from too much reading
Make proofreading effective
Learn English, Myanmar or other languages
Prepare for speeches by hearing your work read aloud.
Listen to e-books or e-material during your commute.
Amuse children by letting your PC read stories to them.
Help seniors or those with vision problems.
9. Myanmar Languages in Digital Era
Computers, Mobile Phone, MP3 Players, Watches and Electronic Devices
Widely used in Social Networks & Online Content.
Accessible to News, Information, Knowledge from International & Local
Localization & Rich Myanmar Content in Electronic Form.
10. Why we need Myanmar Text-To-Speech?
Myanmar Language Users can not use Screen Reader.
Screen/File/Text Reader do not support Myanmar.
Any Myanmar computer user must easier to use Myanmar Language with Text-
To-Speech Features.
The initiative of Open Source Myanmar Text-To-Speech Engine will empower
to other Software Vendor who want to develop/integrate with their
application. E.g. Mobile Phone Manufacturer can integrate Myanmar Language
support easily and without reinvent the wheel.
Myanmar Language Learning will be more easier through Text Reader.
11. How shall we develop Myanmar TTS?
Learn
Define Scope of Work
Collect Digital Asset
Discover the best tool to make TTS and plan for the future
Develop
Develop Myanmar Language Model/Tokenization and Grapheme-To-Sound
Train TTS Engine
Test TTS Engine internally/Public Review
Enhanced
Review the work plan, try to find improvement, applied the feedback
Invite Specialist on the TTS and Improve the Engine with third party opinions
Develop Tools and apply in the real environment (e.g. audio books)
12. Open Source Model of Myanmar TTS
Everyone can participate in the development Team
Anyone can guide to Project
Whoever can contribute their idea
Open Source Model of TTS Engine for Myanmar Language.
We realized that 1 Consultant, 1 Project Leader and 3 Developers will not be
fulfilled the complete Myanmar TTS.
WE need You, Your Feedback, Your Contribution.
13. The purpose of a Text-to-Speech system
To convert any text into natural sounding speech.
First, text needs to be normalized. Normalization is the process of
transforming text into a single canonical form, therefore text is parsed into
single tokens.
Next, the text-to-speech system assigns the appropriate phonetic
transcriptions to each word which reflect how text should be pronounced in
any given natural language. The synthesizer then converts the symbolic
linguistic representations into sound.
The last step is to choose the right speech units which ensure the high quality
and natural sound of generated speech.
14. Architecture of Myanmar TTS
Minimal unit of sound will be Syllable or Syllable-Chain
က ကာ ကာား ကိ ကီ ကီား
မန္တ သစ္စာ ဥမမာ
Word Segmentation or Tokenization
ကလ ားလ ွေလက ာင်ားကိိုကာားဖြင့််သွောားခ့်ကကသည်။
ကလ ားလ ွေလက ာင်ားကိိုကာားဖြင့််သွောားခ့်ကကသည်။
ကလ ား / လ ွေ / လက ာင်ား / ကိို / ကာား / ဖြင့်် / သွောား / ခ့်ကကသည် / ။
Compose Syllable Sound to compose words sound with concatenation.
က+လ ား / လ ွေ / လက ာင်ား / ကိို / ကာား / ဖြင့်် / သွောား / ခ့်+ကက+သည် / ။
Need to adjust speed and intonation between syllable and words.
ကလ ားလ ွေ / လက ာင်ားကိို / ကာားဖြင့်် / သွောားခ့်ကကသည်။
15. Application of Myanmar TTS
The longest application has been in the use of screen readers for people with
visual impairment
commonly used by people with dyslexia and other reading difficulties as well
as by pre-literate children
Speech synthesis techniques are also used in entertainment productions such
as games and animations
Text to Speech for disability and handicapped communication aids have
become widely deployed in Mass Transit.
Text-to speech is also used in second language acquisition
16. Dream for the Myanmar TTS
Text-To-Speech Engine integrated with Mobile Phone, Computer and
Electronic Devices
Voice Command integrated with Mobile Phone, Computer and Electronic
Devices
Every one can read any Myanmar news, information and electronic Text by
Screen Reader
Text-To-Speech Engine empowered in Public Announcement and weather
notification.
Screen Reader Functions will be integrated with OCR, even image can read
aloud.
17. Thanks for being here and participating
in the Project
Sponsorship of the TTS Project by KBZ Group of Company
Great arrangement by Yangon Education Center for Blind
Contribution of Knowledge by several people
Last, not the Least, Warmly welcome to Future Contributors.