2. • Team Members:
Nischal E Rao
Bharat Joshi
Suhas Kamath N
Sharath M Puranik
• Project Guide: Prof. Shantharam Nayak
• Carried out at:
R.V. College of Engineering,
Bangalore, India.
3. • Voice Enabled Desktop Interaction and
Control System(VEDICS) is a software
solution for controlling the desktop system
using voice based commands.
• The system takes audio signals as input,
processes it, recognizes it and executes
the desired action on the desktop system.
4. • All software products should incorporate
accessibility features to enable differently-abled
people to use the software easily and efficiently.
• For persons with physical disabilities, the ability
to simply talk to a computer could be a priceless
asset.
• Hands-free computing is more convenient than
conventional I/O.
5. • The user should be able to
o access any element present on the user’s screen.
o run common programs and applications.
o navigate through the file system.
o perform common window operations like minimize,
maximize, close etc.
• User commands should be easy to remember and use.
• The user must be able to turn the system on and off
whenever required.
6. • VEDICS follows MVC design pattern.
• Flexibility of using any speech-to-text converter for use
with VEDICS.
• VEDICS uses a feedback mechanism to learn what is
being displayed on the desktop.
• Increased accuracy since only relevant words are
recognized.
7. Recognized Text
Desktop
Speech-to-text
Control
Converter
System
Grammar and
Names of visible
elements
Command Currently visible
objects
User’s
Desktop
8. • Speech to text Conversion
Speech To
Text Converter
9. • Grammar and Dictionary are used to
convert sound signals into text.
Speech To
Text Converter
Grammar
Dictionary
10. • The recognized text is given as input to
the Desktop Control System.
Speech To “Open Firefox” Desktop
Text Converter Control
System
Grammar
Dictionary
11. • The Desktop Control System determines
the command to execute on the desktop.
Speech To Desktop
Text Converter Control
System
Open_firefox
command
12. • After successful execution, the names of
objects visible on the screen are collected.
Speech To Desktop
Text Converter Control
System
“File” | “Edit” | “Google”
13. • The collected names are used to update
the grammar and the dictionary files.
Speech To Desktop
Text Converter Control
System
“File”, “Edit”, “Google”
Grammar
Dictionary
14. • The updated grammar and dictionary files
are used in the next recognition cycle.
Speech To
Text Converter
Updated
Grammar
Updated
Dictionary
15. • VEDICS consists of the following parts:
o Sphinx 4 Sub-system : Open Source tool used to convert
speech to text.
o Desktop Control Sub-system: Used to execute the converted
text into corresponding command on the desktop. It re-creates
the grammar file based on what is displayed on the screen.
o Logios Tool : Used to generate a new dictionary based on
what is displayed on the screen.
16.
17. • Accuracy of VEDICS depends on accuracy of Sphinx 4.
• Summary of performance of Sphinx 4:
Parameters Performance
Vocabulary Size 79
Word Error Rate (in %) 1.192
RT Ratio in Single CPU Configuration* 0.25
RT Ratio in Dual CPU Configuration* 0.20
* RT Ratio: Ratio of utterance duration to the time taken to decode the utterance.
18. • Increased accuracy due to context aware nature of
VEDICS.
• Use of small vocabulary further improves accuracy.
• Use of Logios enables recognition of custom words.
Words with any sequence of characters can be
recognized.
• Almost all components accessible on the desktop.
19. • VEDICS can be used to perform most actions that can
be done using a pointing device.
• Using voice to access and control the desktop has many
advantages. This feature can be a boon to the
differently-abled people.
• VEDICS can navigate through file system, open
applications, control the desktop window, and recognize
almost any word.
• VEDICS is context aware. It determines what
is currently being displayed on the desktop and
dynamically generates the grammar and the dictionary.
20. • Dictation facility: The ability to dictate into a text editor or
text field.
• Artificial Intelligence in VEDICS.
• If there is a conflict in name of object on the screen then
the user should be able to select the right object.
• The user should be able to either pronounce the entire
word or spell individual characters of the word.
• Facility to add custom commands to suit the user.
• Screen Reader Facility.
21. Project Link: http://vedics.sourceforge.net/
References:
• Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh,
Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible
Open Source Framework for Speech Recognition”, SML Technical
Report, Sun Microsystems, SMLI TR-2004-139, Nov. 2004
• Kai-Fu Lee, Hsiao-Wuen Hon, Raj Reddy, “An Overview of the
SPHINX Speech Recognition System”, IEEE Transactions on
Acoustics Speech and Signal Processing, Vol 38, No. 1, Jan,
1990.
• Frank Buschmann, Regine Meunier, Hans Rohnert, Peter
Sommerlad, Michael Stal, “Pattern-Oriented Software Architecture
– Vol 1: A System of Patterns”, Wiley Publications, 1996.