Smart blind stick book

Mansoura University
Faculty of Engineering
Dept. of Electronics and
Communication Engineering

Smart Blind Stick
A B. Sc. Project in

Electronics and Communications Engineering

Supervised by

Assist. Prof. Mohamed Abdel-Azim
Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashraf

Department of Electronics and Communications Engineering
Faculty of Engineering-Mansoura University

2011-2012

Mansoura University
Faculty of Engineering
Dept. of Electronics and Comm. Engineering

Smart Blind Stick
A B. Sc. Project in

Electronics and Communications Engineering

Supervised by

Assist. Prof. Mohamed Abdel-Azim
Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashrsf

Department of Electronics and Communications Engineering
Faculty of Engineering-Mansoura University

2011-2012

Team Work

Team Work

No.

Name

Contact Information

1

Ahmed Helmy Abd-Ghaffar

Ahmed2033@gmail.com

2

Nesma Zein El-Abdeen Mohammed

eng_nesma.zein@yahoo.com

3

Aya Gamal Osman El-Mansy

eng_tota_20@hotmail.com

4

Fatma Ayman Mohammed

angel_whisper89@hotmail.com

5

Ahmed Moawad Abo-Elenin Awad

ahmedmowd@gmail.com

i

Acknowledgement

Acknowledgement
We would like to express our gratitude to our advisor and supervisor Dr.
Mohammed Abd ElAzim for guiding this work with interest. We would like to
also thank Eng. Ahmed Shaaban and Eng. Mohammed Gamal and Eng. Eman
Ashraf, Teaching Assistance for the countless hours he spent in the labs. We are
grateful to them for setting high standards and giving us the freedom to explore.
We would like to thank our colleagues for the assistance and constant support
provided by them.
Our Team

ii

Abstract

Abstract
There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends
from the floor to the person’s sternum. So we'll design ultrasound sensor to
detect all kinds of barriers whatever its shape or height and warn him with
vibration. Blind people also face great problems in moving from place to
another in the town and the only way for them is Guide dogs which can cost
about $20, 000 and they can be useful for about 5 – 6 years.
So we'll design GPS for blind people which help him in moving from place
to another in the town with voice orders for directions and he'll identify the
place he want to go with voice only and not need to type any thing.
But we want also to help him in moving indoor or in closed places he goes
daily from place to another we'll design an indoor navigation system depend on
working off line to help him to move from location to another in specific places
home, moles, libraries...Etc. also by voice orders .
He may face a great problem in control his electric devices we'll design for
him a total wireless control system to easily control all his electric devices by
voice connected to a security system to warn him if he indoor or out if any thing
wrong happen and help him to solve this problem .

iv

Contents

Chapter-01: Introduc on………………………………………………………………………………………………..
1.1
Problem Definition …………………………………………………………………………………......
1.2
Problem Solution ………………………………………………………………………………………….
1.3
Business Model …………………………………………………………………………………………….
1.4
Block Diagram……………………………………………………………………………………………….
1.5
Detailed Technical Description ……………………….……………………………………………
1.6
Pre-Project Planning….………………………………………………………………………………….
1.7
Time Planning……………………………………………………………………………………………….
Chapter-02: Speech recognition …………………………………………………………………………………………
2.1
Introduction …………………………………………………………………………………………………
2.2
Literature review ………………………………………………………………………………………….
2.2.1
Pattern recognition ……………………………………………………….
2.2.2
Generation of voice ………………………………………………………
2.2.3
Voice as biometric …………………………………………………………
2.2.4
Speech recognition ……………………………………………………….
2.2.5
Speaker recognition ………………………………………………………
2.2.6
Speechspeaker modeling ……………………………………………..
2.3
Implementation details ………………………………………………………………………………..
2.3.1
Pre-processing and feature extraction ……………………………
2.4
Artificial neural network…………………………………………………….............................
2.4.1
Introduction …………………………………………………………………..
2.4.2
Models …………………………………………………………………………..
2.4.3
Network function …………………………………………………………...
2.4.4
Ann dependency graph …………………………………………………..
2.4.5
Learning ………………………………………………………………………….
2.4.6
Choosing a cost function ………………………………………………..
2.4.7
Learning paradigms ………………………………………………………..
2.4.8
Supervised learning ………………………………………………………..
2.4.9
unsupervised learning …………………………………………………….
2.4.10
Reinforcement learning ………………………………………………….
2.4.11
Learning algorithms…………………………………………………………
2.4.12
Employing artificial neural network ………………………………..
2.4.13
Application ……………………………………………………………………..
2.4.14
Types of models ……………………………………………………………..
2.4.15
Neural network software ……………………………………………….
2.4.16
Types of artificial neural network …………………………………..
2.4.17
Confidence analysis of neural network …………………………..
Chapter-03: Image Processing ………….……………………………………………………………………………..
3.1
Introduction ………………………………………………………………………………………………….
3.1.1
What is digital image processing? ......................................
3.1.2
Motivating problems ………………………………………………………
3.2
Color vision …………………………………………………………………………………………………..
3.2.1
Fundamentals …………………………………………………………………
3.2.2
Image formats supported by mat lab ……………………………..
3.2.3
Working formats in mat lab ……………………………………………
3.3
Aspects of image processing ………………………………………………………………………..

ii

1
1
1
2
2
3
4
4
7
7
7
7
9
11
11
12
13
13
13
22
22
23
24
24
25
26
26
26
27
27
28
28
29
30
31
31
31
32
33
33
33
34
34
35
35
35

Contents

3.4

Image types ………………………………………………………………………………………………….
3.4.1
Intensity image ………………………………………………………………
3.4.2
Binary image ………………………………………………………………….
3.4.3
Indexed image ……………………………………………………………….
3.4.4
RGB image………………………………………………………………………
3.4.5
Multi frame image ………………………………………………………….
3.5
How to ………………………………………………………………………………………………………….
3.5.1
How to convert between different formats ……………………
3.5.2
How to read file ……………………………………………………………..
3.5.3
Loading and saving variables in mat lab ………………………….
3.5.4
How to display an image in mat lab ………………………………..
3.6
Some important definitions ………………………………………………………………………….
3.6.1
Imread function ……………………………………………………………..
3.6.2
Rotation …………………………………………………………………………
3.6.3
Scaling ……………………………………………………………………………
3.6.4
Interpolation ………………………………………………………………….
3.7
Edge detection ……………………………………………………………………………………………..
3.7.1
Canny edge detection …………………………………………………….
3.7.2
Edge tracing ……………………………………………………………………
3.8
Mapping ……………………………………………………………………………………………………….
3.8.1
Mapping image onto surface overview …………………………..
3.8.2
Mapping an image onto elevation data ………………………….
3.8.3
Initializing the IDL display objects……………………………………
3.8.4
Displaying image and geometric surface object………………
3.8.5
Mapping an image onto sphere………………………………………
3.9
Mapping offline…………………………………………………………………………………………….
Chapter-04: GPS naviga on…………………………………………………………………………………………..
4.1
Introduction …………………………………………………………………………………………………
4.1.1
What is GPS ?......................................................................
4.1.2
How it work ?......................................................................
4.2
Basic concepts of GPS …………………………………………………………………………………..
4.3
Position calculation ………………………………………………………………………………………
4.4
Communication ……………………………………………………………………………………………
4.5
Message format …………………………………………………………………………………………..
4.6
Satellite frequencies …………………………………………………………………………………….
4.7
Navigation equations …………………………………………………………………………………..
4.8
Bancroft's method ………………………………………………………………………………………..
4.9
Trilateration ………………………………………………………………………………………………….
4.10
Multidimensional Newton-Raphson calculation ………………………………………….
4.11
Additional method for more than four satellites …………………………..................
4.12
Error sources and analysis ……………………………………………………………………………
4.13
Accuracy enhancement and surveying …………………………………………………………
4.13.1
Augmentation…………………………………………………………………
4.13.2
Precise monitoring………………………………………………………….
4.14
Time keeping ……………………………………………………………………………………………….
4.14.1
Time keeping and leap seconds ……………………………………..

iii

36
36
37
37
37
37
38
38
38
39
39
40
40
40
41
41
41
41
42
43
43
44
46
47
51
51
53
53
53
53
54
55
57
57
58
59
60
60
60
61
61
61
61
62
63
63

Contents

4.14.2
Time keeping accuracy ……………………………………………………
4.14.3
Time keeping format……………………………………………………….
4.14.4
Carrier phase tracking …………………………………………………….
4.15
GPS navigation ……………………………………………………………………………………………..
Chapter-05: Ultrasound …………………………………………………………………………………………….
5.1
Introduction ………………………………………………………………………………………………….
5.1.1
History …………………………………………………………………………….
5.2
Wave motion ………………………………………………………………………………………………..
5.3
Wave characteristics …………………………………………………………………………………….
5.4
Ultrasound intensity ……………………………………………………………………………………..
5.5
Ultrasound velocity ………………………………………………………………………………………
5.6
Attenuation of ultrasound ……………………………………………………………………………
5.7
Reflection …………………………………………………………………………………………………….
5.8
Refraction …………………………………………………………………………………………………….
5.9
Absorption ………………………………………………………………………………………………….
5.10
Hardware part ……………………………………………………………………………………
5.10.1
Introduction …………………………………………………………..
5.10.2
Calculating the distance………………………………………….
5.10.3
Changing beam pattern and beam width ………………….
5.10.4
The development of the sensor…………………………………
Chapter-06: Microcontroller ……………………………………………………………………………………….
6.1
Introduction ……………………………………………………………………………………..
6.1.1
History of microcontroller ………………………………………
6.1.2
Embedded design……………………………………………………..
6.1.3
Interrupt ………………………………………………………………….
6.1.4
Programs …………………………………………………………………
6.1.5
Other microcontroller feature ………………………………..
6.1.6
Higher integration …………………………………………………….
6.1.7
Programming environment ………………………………………
6.2
Types of micro controller ………………………………………………………………….
6.2.1
Interrupt latency ……………………………………………………….
6.3
Microcontroller embedded memory technology …………………………
6.3.1
Data………………………………………………………………………..
6.3.2
Firmware …………………………………………………………………
6.4
PIC microcontroller …………………………………………………………………………..
6.4.1
Family core architecture ………………………………………..
6.5
PIC component …………………………………………………………………………………..
6.5.1
Logic circuit ………………………………………………………………
6.5.2
Power supply ……………………………………………………………
6.6
Development tools……………………………………………………………………………
6.6.1
Device programs ……………………………………………………..
6.6.2
Debugging ……………………………………………………………….
6.7
LCD display ………………………………………………………………………………………..
6.7.1
LCD display pins ……………………………………………………….
6.7.2
LCD screen ………………………………………………………………
6.7.3
LCD memory …………………………………………………………..

iv

63
64
64
66
69
69
69
69
71
72
75
76
77
79
81
83
83
87
87
88
91
91
92
93
93
94
94
95
97
98
99
100
100
101
101
101
101
106
119
127
127
128
130
131
131
132

Contents

6.7.4
LCD basic command …………………………………………………..
6.7.5
LCD connecting ………………………………………………………….
6.7.6
LCD initialization ………………………………………………………
Chapter-07: System Implementa on …………………………………………………………………………
7.1
Introduction ………………………………………………………………………………………
7.2
Survey………………………………………………………………………………………………..
7.3
Searches ……………………………………………………………………………………………
7.3.1
Ultra sound sensor…………………………………………………….
7.3.2
Indoor navigation systems ……………………………………….
7.3.3
Outdoor navigation …………………………………………………
7.4
Sponsors ………………………………………………………………………………………..
7.5
Pre-design ……………………………………………………………………………………….
7.5.1
List of matrices ……………………………………………………….
7.5.2
Competitive Benchmarking Information…………

136
138
139
141
141
142
142
142
142
142
143
143
144
145

7.5.3
Ideal and marginally acceptable target values ………..
7.5.4
Time plan diagram ……………………………………………………
7.6
Design ………………………………………………………………………………………………
7.6.1
Speech recognition ………………………………………………..
7.6.2
Ultra sensors ……………………………………………………………
7.6.3
Outdoor navigation …………………………………………………
7.7
Product architecture ………………………………………………………………………
7.7.1
Product schematic …………………………………………………..
7.7.2
Rough geometric layout ………………………………………….
7.7.3
Incidental interactions ……………………………………………..
7.8
Defining secondary system ……………………………………………………………..
7.9
Detailed interface specification ………………………………………………………
7.10
Establishing the architecture of the chunks ………………………………………
Chapter-08: conclusion ……………………………………………………………………………………………….
8.1
Introduction………………………………………………………………………………….
8.2
Overview…………………………………………………………………………………………..
8.2.1
Outdoor navigation ……………………………………………………
8.2.1
8.2.1.1
Outdoor navigation online ………………………………………
8.2.1.2
Outdoor navigation offline ……………………………………….
8.2
8.2.2
Ultrasound sensor ……………………………………………………..
8.2.3
Object identifier ……………………………………………………….
8.3
Features …………………………………………………………………………………………….

146
146
147
147
149
150
151
151
152
153
154
154
155
157
158
158
158
158
158
159
159
159

v

Chapter 1 | Introduction

1.1 | PROBLEM DEFINITION
There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends from
the floor to the person’s sternum.
Blind people also face great problems in moving from place to another in the
town and the only way for them is Guide dogs which can cost about $20, 000 and
they can be useful for about 5 – 6 years. They also have a great problem to identify
the objects he frequently used in his house as kitchen tools and clothes. And also
he may face a great problem in control his electric devices or have a security
problem and he can't face it.
1.2 | PROBLEM SOLUTION
All previous problems we're trying to solve them. To help the user moving
easily indoor and outdoor we'll use ultrasound sensor to detect the barriers on his
way and alert him by 2 ways vibration motor which speed increases when the
distance decreases and voice alert told him the distance between him and the
barrier.
To solve the problem of moving outside home from place to another we'll
design a software to be used in smart phones to help him in moving from place to
another with voice orders without any external help he just say the place he want to
go then the phone will guide him with voice orders to arrive this place. To help
him to identify the objects we'll use RFID every important object will have tag or
id when the reader read the id it will told him what it is by voice. Inside the home
we'll design a system to control all electronic devices by voice orders and also a
security system designed especially for them the most important in it is the fire
alarm when it detects a fire it will alert him by a call to his mobile phone and
another call to his friends near him for help and also a security system to warn him
if he forget to close his door. After finishing these applications we're going to make
features after graduation by adding new technologies to help him moving in the
street easier and help him crossing roads and reading books. The products in our
market in Egypt for them don't cover any needs for them.

1


The blind needs to move control and do his tasks his self without any help
from anybody. There’s just a white stick without any technologies or features. So
finally we'll install on the white stick a sensor and RFID and the other part is a
software part on the mobile to do the navigation and automation tasks.
1.3 | BUSINESS MODEL
Our customers are blind people and a visually impaired person there's almost
1 million people in Egypt has one of the past problems.
Our product would cover some needs of our customers as helping them to avoid
the barriers on their way and guide them with voice to the direction they must go to
avoid this and also help the to move free without any external help in different
countries by android application on his mobile which designed especially for them
to guide them with voice through roads and tell them the direction they have to go
to arrive their goal.
To reach our goal we met with different customers to know exactly what they
need and help us to get a vision for our final product to be comfortable and also we
were guided technically by our sponsors to find the best way to cover all these
needs.
In our market the available products doesn't cover any needs we just found a white
stick without any technologies to help the user.
1.4 | BLOCK DIAGRAMS

Fig.(1.1): General Project Block Diagram

2


1.5 DETAILED TECHNICAL DESCRIPTION
Our project was built on the simplest available technologies to reach our goal
in the way that comfort the user so we divided our project into 2 parts software and
hardware.
The hardware part consists of MCU pic, MP3 module, cam module and
ultrasound sensor module.
The software part is an android application available to be installed on the
mobile.
In the hardware part there're 2 conditions for it indoor and outdoor.
For indoor only one sensor will measure ranges and cam module will take a photo
to the object when the user reaches 2 cm to detect the code put on and send it to
MCU which processing it and identify the code number and then get the object
name from database and then connect the mp3 module WT588D and get the mp3
file address which contains the name of it and out to from the speaker.
For outdoor 3 sensors HC-SR04 sensors will be activated in 3 direction to
determine the best way no barriers on it and send measured data to MCU and the
MCU detect the best way and send the address of the mp3 which contains the
wanted direction and it would be the output.
For navigation outdoor we'll design android application using Google maps
the user detect the place he want to go with voice and the application detect his
current position using GPS and the digital compass detect the angel of view and
guide him to the direction using GPS data and compass data.

Choose Mode

Left Button

Right Button

Outdoor

Indoor

Fig. (1.2): Button Configuration

3


Fig. (1.3): Indoor & Outdoor Processes Block Diagram

1.6 PRE-PROJECT PLANNING
We start searching for a problem no one care it and we found blinds'
problems take no care to be solved and available products in Egypt aren't found. So
we found it's a good field to start in it to get an opportunity to solve a problem and
also enter a new field in the market with low number of Competitors.
1.7 TIME PLANNING
Project Timing:
The three main parts are individual in execution time but each part has many
branches which are series in execution time.
Timing of Product Introductions:
The timing of launching the product is dependent on the marketing and the
market studying again to the products which must be having low cost and high
quality.

4


Technology Readiness:
One of the fundamental components in the product is technology because the
Android and Ultrasonic technology are taking good importance between the
Egyptian customers.
Market Readiness:
The market always has a readiness to any new product the market is common
between products to give the customers the best one for them.
The Product Plan:
This plan makes the project comfortable in his implementation because
anything arranged or planned to do give the best results.

5

Chapter 2 | Speech Recognition

2.1 | INTRODUCTION
Biometrics is, in the simplest definition, something you are. It is a physical
characteristic unique to each individual such as fingerprint, retina, iris, speech.
Biometrics has a very useful application in security; it can be used to authenticate a
person’s identity and control access to a restricted area, based on the premise that
the set of these physical characteristics can be used to uniquely identify
individuals. Speech signal conveys two important types of information, the
primarily the speech content and on the secondary level, the speaker identity.
Speech recognizers aim to extract the lexical information from the speech signal
independently of the speaker by reducing the inter-speaker variability. On the other
hand, speaker recognition is concerned with extracting the identity of the person
speaking the utterance. So both speech recognition and speaker recognition system
is possible from same voice input.
We use in our project the speech recognition technique because we want in
our project to recognize the word that the stick will make action depending on this
word.
Mel Filter Cepstral Coefficient (MFCC) is used as feature for both speech and
speaker recognition. We also combined energy features and delta and delta-delta
features of energy and MFCC. After calculating feature, neural networks are used
to model the speech recognition. Based on the speech model the system decides
whether or not the uttered speech matches what was prompted to utter.
2.2 | LITERATURE REVIEW
2.2.1 | Pattern Recognition
Pattern recognition, one of the branches of artificial intelligence, sub-section
of machine learning, is the study of how machines can observe the environment,
learn to distinguish patterns of interest from their background, and make sound and
reasonable decisions about the categories of the patterns. A pattern can be a
fingerprint image, a handwritten cursive word, a human face, or a speech signal,
sales pattern etc…
The applications of pattern recognition include data mining, document
classification, financial forecasting, organization and retrieval of multimedia
databases, and biometrics (personal identification based on various physical
attributes such as face, retina, speech, ear and fingerprints).The essential steps of

7


pattern recognition are: Data Acquisition, Preprocessing, Feature Extraction,
Training and Classification.
Features are used to denote the descriptor. Features must be selected so that
they are discriminative and invariant. They can be represented as a vector, matrix,
tree, graph, or string.
They are ideally similar for objects in the same class and very different for
objects indifferent class. Pattern class is a family of patterns that share some
common properties. Pattern recognition by machine involves techniques for
assigning patterns to their respective classes automatically and with as little human
intervention as possible.
Learning and Classification usually use one of the following approaches:
Statistical Pattern Recognition is based on statistical characterizations of patterns,
assuming that the patterns are generated by a probabilistic system. Syntactical (or
Structural) Pattern Recognition is based on the structural interrelationships of
features. Given a pattern, its recognition/classification may consist of one of the
following two tasks according to the type of learning procedure:
1) Supervised Classification (e.g., Discriminant Analysis) in which the input pattern
is identified as a member of a predefined class.
2) Unsupervised Classification (e.g., clustering) in which the pattern is assigned to
a previously unknown class.

Fig. (2.1): General block diagram of pattern recognition system

8


2.2.2 | Generation of Voice
Speech begins with the generation of an airstream, usually by the lungs and
diaphragm -process called initiation. This air then passes through the larynx tube,
where it is modulated by the glottis (vocal chords). This step is called phonation or
voicing, and is responsible fourth generation of pitch and tone. Finally, the
modulated air is filtered by the mouth, nose, and throat - a process called
articulation - and the resultant pressure wave excites the air.

Fig. (2.2): Vocal Schematic

Depending upon the positions of the various articulators different sounds are
produced. Position of articulators can be modeled by linear time- invariant system
that has frequency response characterized by several peaks called formants. The
change in frequency of formants characterizes the phoneme being articulated.
As a consequence of this physiology, we can notice several characteristics of
the frequency domain spectrum of speech. First of all, the oscillation of the glottis

9


results in an underlying fundamental frequency and a series of harmonics at
multiples of this fundamental. This is shown in the figure below, where we have
plotted a brief audio waveform for the phoneme /i: / and its magnitude spectrum.
The fundamental frequency (180 Hz) and its harmonics appear as spikes in the
spectrum. The location of the fundamental frequency is speaker dependent, and is a
function of the dimensions and tension of the vocal chords. For adults it usually
falls between 100 Hz and 250 Hz, and females‟ average significantly higher than
that of males.

Fig. (2.3): Audio Sample for /i: / phoneme showing stationary property of phonemes for a short period

The sound comes out in phonemes which are the building blocks of speech.
Each phoneme resonates at a fundamental frequency and harmonics of it and thus
has high energy at those frequencies in other words have different formats. It is the
feature that enables the identification of each phoneme at the recognition stage.
The variations in

Fig.(2.4): Audio Magnitude Spectrum for /i:/ phoneme showing fundamental frequency and its harmonics

10


Inter-speaker features of speech signal during utterance of a word are modeled in
word training in speech recognition. And for speaker recognition the intra-speaker
variations in features in long speech content is modeled.
Besides the configuration of articulators, the acoustic manifestation of a phoneme
is affected by:
 Physiology and emotional state of speaker.
 Phonetic context.
 Accent.
2.2.3 | Voice as Biometric
The underlying premise for voice authentication is that each person’s voice
differs in pitch, tone, and volume enough to make it uniquely distinguishable.
Several factors contribute to this uniqueness: size and shape of the mouth, throat,
nose, and teeth (articulators) and the size, shape, and tension of the vocal cords.
The chance that all of these are exactly the same in any two people is very low.
Voice Biometric has following advantages from other form of biometrics:
 Natural signal to produce
 Implementation cost is low since, doesn’t require specialized input device
 Acceptable by user
Easily mixed with other form of authentication system for multifactor
authentication only biometric that allows users to authenticate remotely.
2.2.4 | Speech Recognition
Speech is the dominant means for communication between humans, and
promises to be important for communication between humans and machines, if it
can just be made a little more reliable.
Speech recognition is the process of converting an acoustic signal to a set of
words. The applications include voice commands and control, data entry, voice
user interface, automating the telephone operator’s job in telephony, etc. They can
also serve as the input to natural language processing. There is two variant of
speech recognition based on the duration of speech signal:
Isolated word recognition, in which each word is surrounded by some sort of
pause, is much easier than recognizing continuous speech, in which words run into
each other and have to be segmented. Speech recognition is a difficult task because

11


of the many source of variability associated with the signal such as the acoustic
realizations of phonemes, the smallest sound units of which words are composed,
are highly dependent on the context. Acoustic variability can result from changes
in the environment as well as in the position and characteristics of the transducer.
Third, within speaker variability can result from changes in the speaker's physical
and emotional state, speaking rate, or voice quality. Finally, differences in socio
linguistic background, dialect, and vocal tract size and shape can contribute to
cross-speaker variability. Such variability is modeled in various ways. At the level
of signal representation, the representation that emphasizes the speaker
independent features is developed.
2.2.5 | Speaker Recognition
Speaker recognition is the process of automatically recognizing who is
speaking on the basis of individual’s information included in speech waves.
Speaker recognition can be classified into identification and verification. Speaker
recognition has been applied most often as means of biometric authentication.
2.2.5.1 | Types of Speaker Recognition
Speaker Identification
Speaker identification is the process of determining which registered speaker
provides a given utterance. In Speaker Identification (SID) system, no identity
claim is provided, the test utterance is scored against a set of known (registered)
references for each potential speaker and the one whose model best matches the
test utterance is selected. There is two types of speaker identification task closedset and open-set speaker identification .In closed-set, the test utterance belongs to
one of the registered speakers.
During testing, a matching score is estimated for each registered speaker. The
speaker corresponding to the model with the best matching score is selected. This
requires N comparisons for a population of N speakers. In open-set, any speaker
can access the system; those who are not registered should be rejected. This
requires another model referred to as garbage model or imposter model or
background model, which is trained with data provided by other speakers different
from the registered speakers.
During testing, the matching score corresponding to the best speaker model is
compared with the matching score estimated using the garbage model. In order to
accept or reject the speaker, making the total number of comparisons equal to N +

12


1. Speaker identification performance tends to decrease as the population size
increases.
Speaker verification
Speaker verification, on the other hand, is the process of accepting or
rejecting the identity claim of a speaker. That is, the goal is to automatically accept
or reject an identity that is claimed by the speaker. During testing, a verification
score is estimated using the claimed speaker model and the anti-speaker model.
This verification score is then compared to a threshold. If the score is higher than
the threshold, the speaker is accepted, otherwise, the speaker is rejected.
Thus, speaker verification, involves a hypothesis test requiring a simple
binary decision: accept or reject the claimed identity regardless of the population
size. Hence, the performance is quite independent of the population size, but it
depends on the number of test utterances used to evaluate the performance of the
system.
2.2.6 | Speaker/Speech Modeling
There are various pattern modeling/matching techniques. They include
Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Hidden
Markov Modeling (HMM), Artificial Neural Network (ANN), and Vector
Quantization (VQ). These are interchangeably used for speech, speaker modeling.
The best approach is statistical learning methods: GMM for Speaker Recognition,
which models the variations in features of a speaker for a long sequence of
utterance.
And another statistical method widely used for speech recognition is HMM.
HMM models the Markovian nature of speech signal where each phoneme
represents a state and sequence of such phonemes represents a word. Sequence of
Features of such phonemes from different speakers is modeled by HMM.
2.3 | IMPLEMENTATION DETAILS
The implementation of system includes common pre-processing and feature
extraction module, speaker independent speech modeling and classification by
ANNs.
2.3.1 | Pre-Processing and Feature Extraction

13


Starting from the capturing of audio signal, feature extraction consists of the
following steps as shown in the block diagram below:
Speech
Signal

Silence
removal

Preemphasis

Framing

Windowing

DFT

Mel Filter
Bank

Log

IDF
T
CMS

Energy

12MFCC
12 ΔMFCC
12 ΔΔ MFCC

Delta
1 energy
1 Δ energy
1 ΔΔ energy

Fig. (2.5): Pre-Processing and Feature Extraction

2.3.1.1 | Capture







The first step in processing speech is to convert the analog representation
(first air pressure, and then analog electric signals in a microphone) into a digital
signal x[n], where n is an index over time. Analysis of the audio spectrum shows
that nearly all energy resides in the band between DC and 4 kHz, and beyond 10
kHz there is virtually no energy what so ever.
Used sound format:
22050 Hz
16-bits, Signed
Little Endian
Mono Channel
Uncompressed PCM
2.3.1.2 | End point detection and Silence removal
The captured audio signal may contain silence at different positions such as
beginning of signal, in between the words of a sentence, end of signal…. etc. If
silent frames are included, modeling resources are spent on parts of the signal
which do not contribute to the identification. The silence present must be removed
before further processing. There are several ways for doing this: most popular are
Short Time Energy and Zeros Crossing Rate. But they have their own limitation
regarding setting thresholds as an ad hocbasis. The algorithm we used uses

14


statistical properties of background noise as well as physiological aspect of speech
production and does not assume any ad hoc threshold.
It assumes that background noise present in the utterances is Gaussian in
nature. Usually first 200msec or more (we used 4410 samples for the sampling rate
22050samples/sec) of a speech recording corresponds to silence (or background
noise) because the speaker takes some time to read when recording starts.
Endpoint Detection Algorithm:
Step 1:
Calculate the mean (μ) and standard deviation (σ) of the first 200ms samples
of the given utterance. The background noise is characterized by this μ and σ.
Step 2:
Go from 1st sample to the last sample of the speech recording. In each
sample, check whether one-dimensional Mahalanobis distance functions i.e. | x-μ |/
σ greater than 3 or not. If Mahalanobis distance function is greater than 3, the
sample is to be treated as voiced sample otherwise it is an unvoiced/silence. The
threshold reject the samples up to 99.7% as per given by P [|x−μ|≤3σ] =0.997 in a
Gaussian distribution thus accepting only the voiced samples.
Step 3:
Mark the voiced sample as 1 and unvoiced sample as 0. Divide the whole
speech signal into 10 ms non-overlapping windows. Represent the complete speech
by only zeros and ones.
Step 4:
Consider there are M number of zeros and N number of ones in a window. If
M ≥ N then convert each of ones to zeros and vice versa. This method adopted here
keeping in mind that a speech production system consisting of vocal cord, tongue,
vocal tract etc. cannot change abruptly in a short period of time window taken here
as 10ms.
Step 5:
Collect the voiced part only according to the labeled „1‟ samples from the
windowed array and dump it in a new array. Retrieve the voiced part of the
original speech signal from labeled 1 sample.

15


Fig. (2.6): Input signal to End-point detection system

Fig. (2.7): Output signal from End point Detection System

2.3.1.3 | PCM Normalization
The extracted pulse code modulated values of amplitude is normalized, to
avoid amplitude variation during capturing.
2.3.1.4 | Pre-emphasis
Usually speech signal is pre-emphasized before any further processing, if we
look at the spectrum for voiced segments like vowels, there is more energy at
lower frequencies than the higher frequencies. This drop in energy across
frequencies is caused by the nature of the glottal pulse. Boosting the high
frequency energy makes information from these higher formants more available to
the acoustic model and improves phone detection accuracy. The pre-emphasis filter
is a first-order high-pass filter. In the time domain, with input x[n]and 0.9 ≤ α ≤
1.0, the filter equation is:
y[n] = x[n]− α x[n−1]
We used α=0.95.

16


Fig. (2.8): Signal before Pre-Emphasis

Fig.(2.9): Signal after Pre-Emphasis

2.3.1.5 | Framing and windowing
Speech is a non-stationary signal, meaning that its statistical properties are not
constant across time. Instead, we want to extract spectral features from a small
window of speech that characterizes a particular sub phone and for which we can
make the (rough) assumption that the signal is stationary (i.e. its statistical
properties are constant within this region).We used frame block of 23.22ms with
50% overlapping i.e., 512 samples per frame.

17


Fig.(2.10): Frame Blocking of the Signal

The rectangular window (i.e., no window) can cause problems, when we do
Fourier analysis; it abruptly cuts of the signal at its boundaries. A good window
function has a narrow main lobe and low side lobe levels in their transfer functions,
which shrinks the values of the signal toward zero at the window boundaries,
avoiding discontinuities. The most commonly used window function in speech
processing is the Hamming window defined as follows:
(
)
( )
{
(
)}

Fig.(2.11): Hamming window

The extraction of the signal takes place by multiplying the value of the signal
at time n, s frame [n], with the value of the window at time n, S w [n]:
Y[n] = Sw[n] × Sframe[n]

18


Fig.(2.12): A single frame before and after windowing

2.3.1.6 | Discrete Fourier Transform
A Discrete Fourier Transform (DFT) of the windowed signal is used to extract
the frequency content (the spectrum) of the current frame. The tool for extracting
spectral information i.e., how much energy the signal contains at discrete
frequency bands for a discrete-time (sampled) signal is the Discrete Fourier
Transform or DFT. The input to the DFT is a windowed signal x[n]...x[m], and the
output, for each of N discrete frequency bands, is a complex number X[k]
representing the magnitude and phase of that frequency component in the original
signal.
|∑

( )

(

)

|

The commonly used algorithm for computing the DFT is the Fast Fourier
Transform or in short FFT.

2.3.1.7 | Mel Filter
For calculating the MFCC, first, a transformation is applied according to the
following formula:
( )

[

]

Where, x is the linear frequency. Then, a filter bank is applied to the
amplitude of the Mel-scaled spectrum. The Mel frequency warping is most
conveniently done by utilizing a filter bank with filters centered according to Mel

19


frequencies. The width of the triangular filters varies according to the Mel scale, so
that the log total energy in a critical band around the center frequency is included.
The centers of the filters are uniformly spaced in the Mel scale.

Fig.(2.13): Equally spaced Mel values

The result of Mel filter is information about distribution of energy at each Mel
scale band. We obtain a vector of outputs (12 coeffs.) from each filter.

Fig.(2.13): Triangular filter bank in frequency scale

We have used 30 filters in the filter bank.

20


2.3.1.8 | Cestrum by Inverse Discrete Fourier Transform
Cestrum transform is applied to the filter outputs in order to obtain MFCC
feature of each frame. The triangular filter outputs Y (i), i=0, 1, 2… M are
compressed using logarithm, and discrete cosine transform (DCT) is applied. Here,
M is equal to number of filters in filter bank i.e., 30.
[ ]

∑

()

[

(

)]

Where, C[n] is the MFCC vector for each frame.
The resulting vector is called the Mel-frequency cepstrum (MFC), and the
individual components are the Mel-frequency Cepstral coefficients (MFCCs). We
extracted 12 features from each speech frame.
2.3.1.9 | Post Processing
Cepstral Mean Subtraction (CMS)
A speech signal may be subjected to some channel noise when recorded, also
referred to as the channel effect. A problem arises if the channel effect when
recording training data for a given person is different from the channel effect in
later recordings when the person uses the system. The problem is that a false
distance between the training data and newly recorded data is introduced due to the
different channel effects. The channel effect is eliminated by subtracting the Melcepstrum coefficients with the mean Mel-cepstrum coefficients:
( )

( )

∑ ( )

The energy feature
The energy in a frame is the sum over time of the power of the samples in the
frame; thus for a signal x in a window from time sample t1 to time sample t2 the
energy is:
∑

[ ]

Delta feature
Another interesting fact about the speech signal is that it is not constant from
frame to frame. Co-articulation (influence of a speech sound during another

21


adjacent or nearby speech sound) can provide a useful cue for phone identity. It
can be preserved by using delta features. Velocity (delta) and acceleration (delta
delta) coefficients are usually obtained from the static window based information.
This delta and delta delta coefficients model the speed and acceleration of the
variation of Cepstral feature vectors across adjacent windows. A simple way to
compute deltas would be just to compute the difference between frames; thus the
delta value d(t ) for a particular Cepstral value c (t) at time t can be estimated as:
( )
[]
[]
[]
The differentiating method is simple, but since it acts as a high-pass filtering
operation on the parameter domain, it tends to amplify noise. The solution to this is
linear regression, i.e. first-order polynomial, the least squares solution is easily
shown to be of the following form:
[]
∑
[]
∑
Where, M is regression window size. We used M=4.








Composition of Feature Vector
We calculated 39 Features from each frame:
12 MFCC Features.
12 Deltas MFCC.
12 Delta-Deltas MFCC.
1 Energy Feature.
1 Delta Energy Feature.
1 Delta-Delta Energy Feature.
2.4 | ARTIFICIAL NEURAL NETWORKS
2.4.1 | Introduction
We have used ANNs to model our system and train voices and test it to
classify it into words categories which return actions. And here we will make an
overview about artificial neural networks.
The original inspiration for the term Artificial Neural Network came from
examination of central nervous systems and their neurons, axons, dendrites, and
synapses, which constitute the processing elements of biological neural networks
investigated by neuroscience. In an artificial neural network, simple artificial
nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or

22


"units", are connected together to form a network of nodes mimicking the
biological neural networks — hence the term "artificial neural network".
Because neuroscience is still full of unanswered questions, and since there are
many levels of abstraction and therefore many ways to take inspiration from the
brain, there is no single formal definition of what an artificial neural network is.
Generally, it involves a network of simple processing elements that exhibit
complex global behavior determined by connections between processing elements
and element parameters. While an artificial neural network does not have to be
adaptive per se, its practical use comes with algorithms designed to alter the
strength (weights) of the connections in the network to produce a desired signal
flow.
These networks are also similar to the biological neural networks in the sense
that functions are performed collectively and in parallel by the units, rather than
there being a clear delineation of subtasks to which various units are assigned (see
also connectionism). Currently, the term Artificial Neural Network (ANN) tends to
refer mostly to neural network models employed in statistics, cognitive psychology
and artificial intelligence. Neural network models designed with emulation of the
central nervous system (CNS) in mind are a subject of theoretical neuroscience and
computational neuroscience.
In modern software implementations of artificial neural networks, the
approach inspired by biology has been largely abandoned for a more practical
approach based on statistics and signal processing. In some of these systems,
neural networks or parts of neural networks (such as artificial neurons) are used as
components in larger systems that combine both adaptive and non-adaptive
elements. While the more general approach of such adaptive systems is more
suitable for real-world problem solving, it has far less to do with the traditional
artificial intelligence connectionist models. What they do have in common,
however, is the principle of non-linear, distributed, parallel and local processing
and adaptation. Historically, the use of neural networks models marked a paradigm
shift in the late eighties from high-level (symbolic) artificial intelligence,
characterized by expert systems with knowledge embodied in if-then rules, to lowlevel (sub-symbolic) machine learning, characterized by knowledge embodied in
the parameters of a dynamical system.
2.4.2 | Models

23


Neural network models in artificial intelligence are usually referred to as
artificial neural networks (ANNs); these are essentially simple mathematical
models defining a function or a distribution over or both and , but sometimes
models are also intimately associated with a particular learning algorithm or
learning rule. A common use of the phrase ANN model really means the definition
of a class of such functions (where members of the class are obtained by varying
parameters, connection weights, or specifics of the architecture such as the number
of neurons or their connectivity).
2.4.3 | Network Function
The word network in the term 'artificial neural network' refers to the inter–
connections between the neurons in the different layers of each system. An
example system has three layers. The first layer has input neurons, which send data
via synapses to the second layer of neurons, and then via more synapses to the
third layer of output neurons. More complex systems will have more layers of
neurons with some having increased layers of input neurons and output neurons.
The synapses store parameters called "weights" that manipulate the data in the
calculations. An ANN is typically defined by three types of parameters:
 The interconnection pattern between different layers of neurons
 The learning process for updating the weights of the interconnections
 The activation function that converts a neuron's weighted input to its output
activation.
Mathematically, a neuron's network function is defined as a composition of
other functions, which can further be defined as a composition of other functions.
This can be conveniently represented as a network structure, with arrows depicting
the dependencies between variables. A widely used type of composition is the
nonlinear weighted sum, where (commonly referred to as the activation function)
is some predefined function, such as the hyperbolic tangent. It will be convenient
for the following to refer to a collection of functions as simply a vector.
2.4.4 | ANN dependency graph
This figure depicts such a decomposition of , with dependencies between
variables indicated by arrows. These can be interpreted in two ways.
The first view is the functional view: the input is transformed into a 3dimensional vector , which is then transformed into a 2-dimensional vector , which
is finally transformed into . This view is most commonly encountered in the
context of optimization.

24


The second view is the probabilistic view: the random variable depends upon
the random variable , which depends upon , which depends upon the random
variable . This view is most commonly encountered in the context of graphical
models.
The two views are largely equivalent. In either case, for this particular
network architecture, the components of individual layers are independent of each
other (e.g., the components of are independent of each other given their input).
This naturally enables a degree of parallelism in the implementation. Two separate
depictions of the recurrent ANN dependency graph.
Networks such as the previous one are commonly called feed forward,
because their graph is a directed acyclic graph. Networks with cycles are
commonly called recurrent. Such networks are commonly depicted in the manner
shown at the top of the figure, where is shown as being dependent upon itself.
However, an implied temporal dependence is not shown.
2.4.5 | Learning
What has attracted the most interest in neural networks is the possibility of
learning. Given a specific task to solve, and a class of functions, learning means
using a set of observations to find which solves the task in some optimal sense.
This entails defining a cost function such that, for the optimal solution, - i.e.,
no solution has a cost less than the cost of the optimal solution (see Mathematical
optimization).
The cost function is an important concept in learning, as it is a measure of
how far away a particular solution is from an optimal solution to the problem to be
solved. Learning algorithms search through the solution space to find a function
that has the smallest possible cost.
For applications where the solution is dependent on some data, the cost must
necessarily be a function of the observations; otherwise we would not be modeling
anything related to the data. It is frequently defined as a statistic to which only
approximations can be made. As a simple example, consider the problem of
finding the model , which minimizes , for data pairs drawn from some distribution
. In practical situations we would only have samples from and thus, for the above
example, we would only minimize . Thus, the cost is minimized over a sample of
the data rather than the entire data set.

25


When some form of online machine learning must be used, where the cost is
partially minimized as each new example is seen. While online machine learning is
often used when is fixed, it is most useful in the case where the distribution
changes slowly over time. In neural network methods, some form of online
machine learning is frequently used for finite datasets.
2.4.6 | Choosing a cost function
While it is possible to define some arbitrary, ad hoc cost function, frequently a
particular cost will be used, either because it has desirable properties (such as
convexity) or because it arises naturally from a particular formulation of the
problem (e.g., in a probabilistic formulation the posterior probability of the model
can be used as an inverse cost). Ultimately, the cost function will depend on the
desired task. An overview of the three main categories of learning tasks is provided
below.
2.4.7 | Learning paradigms
There are three major learning paradigms, each corresponding to a particular
abstract learning task. These are supervised learning, unsupervised learning and
reinforcement learning.
2.4.8 | Supervised learning
In supervised learning, we are given a set of example pairs and the aim is to
find a function in the allowed class of functions that matches the examples. In
other words, we wish to infer the mapping implied by the data; the cost function is
related to the mismatch between our mapping and the data and it implicitly
contains prior knowledge about the problem domain.
A commonly used cost is the mean-squared error, which tries to minimize the
average squared error between the network's output, f(x), and the target value y
over all the example pairs. When one tries to minimize this cost using gradient
descent for the class of neural networks called multilayer perceptron’s, one obtains
the common and well-known back-propagation algorithm for training neural
networks.
Tasks that fall within the paradigm of supervised learning are pattern
recognition (also known as classification) and regression (also known as function
approximation). The supervised learning paradigm is also applicable to sequential

26


data (e.g., for speech and gesture recognition). This can be thought of as learning
with a "teacher," in the form of a function that provides continuous feedback on the
quality of solutions obtained thus far.
2.4.9 | Unsupervised learning
In unsupervised learning, some data is given and the cost function to be
minimized, that can be any function of the data and the network's output.
The cost function is dependent on the task (what we are trying to model) and
our a priori assumptions (the implicit properties of our model, its parameters and
the observed variables).
As a trivial example, consider the model, where is a constant and the cost.
Minimizing this cost will give us a value of that is equal to the mean of the data.
The cost function can be much more complicated. Its form depends on the
application: for example, in compression it could be related to the mutual
information between and, whereas in statistical modeling, it could be related to the
posterior probability of the model given the data. (Note that in both of those
examples those quantities would be maximized rather than minimized).
Tasks that fall within the paradigm of unsupervised learning are in general
estimation problems; the applications include clustering, the estimation of
statistical distributions, compression and filtering.
2.4.10 | Reinforcement learning
In reinforcement learning, data are usually not given, but generated by an
agent's interactions with the environment. At each point in time, the agent performs
an action and the environment generates an observation and an instantaneous cost,
according to some (usually unknown) dynamics. The aim is to discover a policy
for selecting actions that minimizes some measure of a long-term cost; i.e., the
expected cumulative cost. The environment's dynamics and the long-term cost for
each policy are usually unknown, but can be estimated.
More formally, the environment is modeled as a Markov decision process
(MDP) with states and actions with the following probability distributions: the
instantaneous cost distribution, the observation distribution and the transition,
while a policy is defined as conditional distribution over actions given the
observations. Taken together, the two define a Markov chain (MC). The aim is to

27


discover the policy that minimizes the cost; i.e., the MC for which the cost is
minimal.
ANNs are frequently used in reinforcement learning as part of the overall
algorithm. Dynamic programming has been coupled with ANNs (Neuro dynamic
programming) by Bertsekas and Tsitsiklis and applied to multi-dimensional
nonlinear problems such as those involved in vehicle routing or natural resources
management because of the ability of ANNs to mitigate losses of accuracy even
when reducing the discretization grid density for numerically approximating the
solution of the original control problems.
Tasks that fall within the paradigm of reinforcement learning are control
problems, games and other sequential decision making tasks.
2.4.11 | Learning algorithms
Training a neural network model essentially means selecting one model from
the set of allowed models (or, in a Bayesian framework, determining a distribution
over the set of allowed models) that minimizes the cost criterion. There are
numerous algorithms available for training neural network models; most of them
can be viewed as a straightforward application of optimization theory and
statistical estimation.
Most of the algorithms used in training artificial neural networks employ some
form of gradient descent. This is done by simply taking the derivative of the cost
function with respect to the network parameters and then changing those
parameters in a gradient-related direction.
Evolutionary methods, simulated annealing, expectation-maximization, nonparametric methods and particle swarm optimization are some commonly used
methods for training neural networks.
2.4.12 | Employing artificial neural networks
Perhaps the greatest advantage of ANNs is their ability to be used as an
arbitrary function approximation mechanism that 'learns' from observed data.
However, using them is not so straightforward and a relatively good understanding
of the underlying theory is essential.
Choice of model: This will depend on the data representation and the
application. Overly complex models tend to lead to problems with learning.

28


Learning algorithm: There is numerous trades-offs between learning
algorithms. Almost any algorithm will work well with the correct hyper parameters
for training on a particular fixed data set. However selecting and tuning an
algorithm for training on unseen data requires a significant amount of
experimentation.
Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online
learning and large data set applications. Their simple implementation and the
existence of mostly local dependencies exhibited in the structure allows for fast,
parallel implementations in hardware.
2.4.13 | Applications
The utility of artificial neural network models lies in the fact that they can be
used to infer a function from observations. This is particularly useful in
applications where the complexity of the data or task makes the design of such a
function by hand impractical.
2.4.13.1 | Real-life applications






The tasks artificial neural networks are applied to tend to fall within the
following broad categories:
Function approximation, or regression analysis, including time series prediction,
fitness approximation and modeling.
Classification, including pattern and sequence recognition, novelty detection and
sequential decision making.
Data processing, including filtering, clustering, blind source separation and
compression.
Robotics, including directing manipulators, Computer numerical control.
Application areas include system identification and control (vehicle control,
process control, natural resources management), quantum chemistry, game-playing
and decision making (backgammon, chess, poker), pattern recognition (radar
systems, face identification, object recognition and more), sequence recognition
(gesture, speech, handwritten text recognition), medical diagnosis, financial

29


applications (automated trading systems), data mining (or knowledge discovery in
databases, "KDD"), visualization and e-mail spam filtering.
Artificial neural networks have also been used to diagnose several cancers.
An ANN based hybrid lung cancer detection system named HLND improves the
accuracy of diagnosis and the speed of lung cancer radiology. These networks have
also been used to diagnose prostate cancer. The diagnoses can be used to make
specific models taken from a large group of patients compared to information of
one given patient.
The models do not depend on assumptions about correlations of different
variables. Colorectal cancer has also been predicted using the neural networks.
Neural networks could predict the outcome for a patient with colorectal cancer
with a lot more accuracy than the current clinical methods. After training, the
networks could predict multiple patient outcomes from unrelated institutions.
2.4.13.2 | Neural networks and neuroscience
Theoretical and computational neuroscience is the field concerned with the
theoretical analysis and computational modeling of biological neural systems.
Since neural systems are intimately related to cognitive processes and behavior, the
field is closely related to cognitive and behavioral modeling.
The aim of the field is to create models of biological neural systems in order
to understand how biological systems work. To gain this understanding,
neuroscientists strive to make a link between observed biological processes (data),
biologically plausible mechanisms for neural processing and learning (biological
neural network models) and theory (statistical learning theory and information
theory).
2.4.14 | Types of models
Many models are used in the field defined at different levels of abstraction
and modeling different aspects of neural systems. They range from models of the
short-term behavior of individual neurons, models of how the dynamics of neural
circuitry arise from interactions between individual neurons and finally to models
of how behavior can arise from abstract neural modules that represent complete
subsystems. These include models of the long-term, and short-term plasticity, of
neural systems and their relations to learning and memory from the individual
neuron to the system level.

30


2.4.15 | Neural network software
Neural network software is used to simulate research, develop and apply
artificial neural networks, biological neural networks and in some cases a wider
array of adaptive systems.
2.4.16 | Types of artificial neural networks
Artificial neural network types vary from those with only one or two layers of
single direction logic, to complicated multi–input many directional feedback loop
and layers. On the whole, these systems use algorithms in their programming to
determine control and organization of their functions. Some may be as simple as a
one neuron layer with an input and an output, and others can mimic complex
systems such as dANN, which can mimic chromosomal DNA through sizes at
cellular level, into artificial organisms and simulate reproduction, mutation and
population sizes.
Most systems use "weights" to change the parameters of the throughput and
the varying connections to the neurons. Artificial neural networks can be
autonomous and learn by input from outside "teachers" or even self-teaching from
written in rules.
2.4.17 | Confidence analysis of a neural network
Supervised neural networks that use an MSE cost function can use formal
statistical methods to determine the confidence of the trained model. The MSE on
a validation set can be used as an estimate for variance. This value can then be
used to calculate the confidence interval of the output of the network, assuming a
normal distribution. A confidence analysis made this way is statistically valid as
long as the output probability distribution stays the same and the network is not
modified.
By assigning a softmax activation function on the output layer of the neural
network (or a softmax component in a component-based neural network) for
categorical target variables, the outputs can be interpreted as posterior
probabilities. This is very useful in classification as it gives a certainty measure on
classifications.

31

Chapter 3 | Image Processing

3.1 | INTRODUCTION
This chapter is an introduction on how to handle images in Matlab. When
working with images in Matlab, there are many things to keep in mind such as
loading an image, using the right format, saving the data as different data types,
how to display an image, conversion between different image formats, etc. This
worksheet presents some of the commands designed for these operations. Most of
these commands require you to have the Image processing tool box installed with
MATLAB. To find out if it is installed type very at the Matlab prompt. This gives
you a list of what tool boxes that are installed on your system.
For further reference on image handling in Matlab you are recommended to
use Matlab's help browser. There is an extensive (and quite good) on-line manual
for the Image processing tool box that you can access via Matlab's help browser.
The first sections of this worksheet are quite heavy. The only way to
understand how the presented commands work, is to carefully work through the
examples given at the end of the worksheet. Once you can get these examples to
work, experiment on your own using your favorite image!
3.1.1 | What Is Digital Image Processing?
Transforming digital information representing images.
3.1.2 | Motivating Problems:
1.
2.
3.
4.
5.
6.
7.
8.
9.

Improve pictorial information for human interpretation.
Remove noise.
Correct for motion, camera position, and distortion.
Enhance by changing contrast, color.
Segmentation - dividing an image up into constituent parts
Representation - representing an image by some more abstract.
Models Classification.
Reduce the size of image information for efficient handling.
Compression with loss of digital information that minimizes loss of "perceptual"
information. JPEG and GIF, MPEG.

33


3.2 | COLOR VISION
The color-responsive chemicals in the cones are called cone pigments and are
very similar to the chemicals in the rods. The retinal portion of the chemical is the
same, however the scotopsin is replaced with photopsins. Therefore, the colorresponsive pigments are made of retinal and photopsins. There are three kinds of
color-sensitive pigments:
• Red-sensitive pigment
• Green-sensitive pigment
• Blue-sensitive pigmentlution representations versus quality of service.
Each cone cell has one of these pigments so that it is sensitive to that color.
The human eye can sense almost any gradation of color when red, green and blue
are mixed.
The wavelengths of the three types of cones (red, green and blue) are shown.
The peak absorbancy of blue-sensitive pigment is 445 nanometers, for greensensitive pigment it is 535 nanometers, and for red-sensitive pigment it is 570
nanometers.
MATLAB stores most images as two-dimensional arrays (i.e., matrices), in
which each element of the matrix corresponds to a single pixel in the displayed
image. For example, an image composed of 200 rows and 300 columns of different
colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images,
such as RGB, require a three dimensional array, where the first plane in the 3rd
dimension represents the red pixel intensities, the second plane represents the
green pixel intensities, and the third plane represents the blue pixel intensities.
To reduce memory requirements, MATLAB supports storing image data in
arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit
unsigned integers. These arrays require one-eighth or one-fourth as much memory
as data in double arrays. An image whose data matrix has class uint8 is called an 8bit image; an image whose data matrix has class uint16 is called a 16-bit image.
3.2.1 | Fundamentals
A digital image is composed of pixels which can be thought of as small dots
on the screen. A digital image is an instruction of how to color each pixel. We will
see in detail later on how this is done in practice. A typical size of an image is 512by-512 pixels. Later on in the course you will see that it is convenient to let the

33


dimensions of the image to be a power of 2. For example, 2 9=512. In the general
case we say that an image is of size m-by-n if it is composed of m pixels in the
vertical direction and n pixels in the horizontal direction.
Let us say that we have an image on the format 512-by-1024 pixels. This
means that the data for the image must contain information about 524288 pixels,
which requires a lot of memory! Hence, compressing images is essential for
efficient image processing. You will later on see how Fourier analysis and Wavelet
analysis can help us to compress an image significantly. There are also a few
"computer scientific" tricks (for example entropy coding) to reduce the amount of
data required to store an image.
3.2.2 | Image Formats Supported By Mat lab.
The following image formats are supported by Mat lab:







BMP
HDF
JPEG
PCX
TIFF
XWB
Most images you find on the Internet are JPEG-images which is the name for
one of the most widely used compression standards for images. If you have
stored an image you can usually see from the suffix what format it is stored in. For
example, an image named myimage.jpg is stored in the JPEG format and we will
see later on that we can load an image of this format into Mat lab.
3.2.3 | Working Formats In Matlab:
If an image is stored as a JPEG-image on your disc we first read it into
Matlab. However, in order to start working with an image, for example perform a
wavelet transform on the image, we must convert it into a different format. This
section explains four common formats.
3.3 | ASPECTS OF IMAGE PROCESSING

33


Image Enhancement: Processing an image so that the result is more suitable for a
particular application. (Sharpening or deploring an out of focus image, highlighting
edges, improving image contrast, or brightening an image, removing noise)
Image Restoration: This may be considered as reversing the damage done to an
image by a known cause. (Removing of blur caused by linear motion, removal of
optical distortions)
Image Segmentation: This involves subdividing an image into constituent parts,
or isolating certain aspects of an image.(finding lines, circles, or particular shapes
in an image, in an aerial photograph, identifying cars, trees, buildings, or roads.
3.4 | IMAGE TYPES
3.4.1 | Intensity Image (Gray Scale Image)
This is the equivalent to a "gray scale image" and this is the image we will
mostly work with in this course. It represents an image as a matrix where every
element has a value corresponding to how bright/dark the pixel at the
corresponding position should be colored. There are two ways to represent the
number that represents the brightness of the pixel: The double class (or data type).
This assigns a floating number ("a number with decimals") between 0 and 1 to
each pixel. The value 0 corresponds to black and the value 1 corresponds to white.
The other class is called uint8 which assigns an integer between 0 and 255 to
represent the brightness of a pixel. The value 0 corresponds to black and 255 to
white. The class uint8 only requires roughly 1/8 of the storage compared to the
class double. On the other hand, many mathematical functions can only be applied
to the double class. We will see later how to convert between double and uint8.

Fig. (3.1)

33


3.4.2 | Binary Image:
This image format also stores an image as a matrix but can only color a pixel
black or white (and nothing in between). It assigns a 0 for black and a 1 for white.
3.4.3 | Indexed Image:
This is a practical way of representing color images. (In this course we will
mostly work with gray scale images but once you have learned how to work with a
gray scale image you will also know the principle how to work with color images.)
An Indexed image stores an image as two matrices. The first matrix has the same
size as the image and one number for each pixel. The second matrix is called the
color map and its size may be different from the image. The numbers in the first
matrix is an instruction of what number to use in the color map matrix.

Fig. (3.2)

3.4.4 | RGB Image
This is another format for color images. It represents an image with three
matrices of sizes matching the image format. Each matrix corresponds to one of
the colors red, green or blue and gives an instruction of how much of each of these
colors a certain pixel should use.
3.4.5 | Multi-frame Image:
In some applications we want to study a sequence of images. This is very
common in biological and medical imaging where you might study a sequence of
slices of a cell. For these cases, the multi-frame format is a convenient way of

33


working with a sequence of images. In case you choose to work with biological
imaging later on in this course, you may use this format.

3.5 | HOW TO?
3.5.1 | How To Convert Between Different Formats:
The following table shows how to convert between the different formats given
above. All these commands require the Image processing tool box!
Table(3.1)Image format conversion (Within the parenthesis you type
the name of the image you wish to convert)
Operation
Convert between intensity/indexed/RGB format to binary format.
Convert between intensity format to indexed format.
Convert between indexed format to intensity format.
Convert between indexed format to RGB format.
Convert a regular matrix to intensity format by scaling.
Convert between RGB format to intensity format.
Convert between RGB format to indexed format.

Matlab
command
dither()
gray2ind()
ind2gray()
ind2rgb()
mat2gray()
rgb2gray()
rgb2ind()

The command mat2gray is useful if you have a matrix representing an image
but the values representing the gray scale range between, let's say, 0 and 1000. The
command mat2gray automatically re scales all entries so that they fall within 0 and
255 (if you use the uint class) or 0 and 1 (if you use the double class).
3.5.2 | How to Read Files
When you encounter an image you want to work with, it is usually in form
of a file (for example, if you down load an image from the web, it is usually stored
as a JPEG-file). Once we are done processing an image, we may want to write it
back to a JPEG-file so that we can, for example, post the processed image on the
web. This is done using the imread and imwrite commands. These commands
require the Image processing tool box!

33


Table(3.2)Reading and writing image files
Operation
Read an image.
(Within the parenthesis you type the name of the image file you
wish to read. Put the file name within single quotes
Write an image to a file.
(As the first argument within the parenthesis you type the name
of the image you have worked with. As a second argument
within the parenthesis you type the name of the file and format
that you want to write the image to. Put the file name within
single quotes.

Matlab command
imread()

imwrite( )

Make sure to use semi-colon; after these commands, otherwise you will get
LOTS OF number scrolling on your screen... The commands imread and imwrite
support the formats given in the section "Image formats supported by Matlab"
above.
3.5.3 | Loading And Saving Variables in Matlab
This section explains how to load and save variables in Mat lab. Once you
have read a file, you probably convert it into an intensity image (a matrix) and
work with this matrix. Once you are done you may want to save the matrix
representing the image in order to continue to work with this matrix at another
time. This is easily done using the commands save and load. Note that save and
load are commonly used Matlab commands, and works independently of what tool
boxes that are installed.
Table(3.3) Loading and saving variables
Operation
Save the variable X.
Load the variable X.

Matlab command
Save X
Load X

3.5.4 | How to Display an Image in MATLAB
Here are a couple of basic Mat lab commands (do not require any tool box)
for displaying an image.

33


Table(3.4)Displaying an image given on matrix form
Operation
Display an image represented as the matrix X.
Adjust the brightness .S is a parameter such that -1<s<0 gives a
darker image, 0<s<1 gives a brighter image.
Change the colors to gray.

Matlab command
imagesc(X)
brighten(s)
colormap(gray)

Sometimes your image may not be displayed in gray scale even though you
might have converted it into a gray scale image. You can then use the command
colormap (gray) to "force" Matlab to use a gray scale when displaying an image.
If you are using Matlab with an Image processing tool box installed, I
recommend you to use the command imshow to display an image.
Table (3.5)Displaying an image given on matrix form (with image processing tool box)
Operation
Matlab command
Display an image represented as the matrix X.
imshow(X)
Zoom in (using the left and right mouse button).
zoom on
Turn off the zoom function.
zoom off

3.6 | SOME IMPORTANT DEFINITIONS
3.6.1 | Imread Function
A = imread (filename, fmt) reads a grayscale or true color image named filename
into A. If the file contains a grayscale intensity image, A is a two-dimensional
array. If the file contains a true color (RGB) image, A is a three-dimensional (mby-n-by-3) array.
3.6.2 | Rotation
>> B = imrotate (A, ANGLE, METHOD)

Where;
A: Your image.
ANGLE: The angle (in degrees) you want to rotate your image in the counter
clockwise direction.
METHOD: A string that can have one of these values
If you omit the METHOD argument, IMROTATE uses the default method of
'nearest'.

34


Note: to rotate the image clockwise, specify a negative angle. The returned image
matrix B is, in general, larger than A to include the whole rotated image.
IMROTATE sets invalid values on the periphery of B to 0.
3.6.3 | Scaling
IMRESIZE resizes an image of any type using the specified interpolation
method. Supported interpolation methods
3.6.4 | Interpolation
'nearest' (default) nearest neighbor interpolation?
'bilinear' bilinear interpolation?
'bicubic' bicubic interpolation ?
B = IMRESIZE(A,M,METHOD) returns an image that is M times the size of A. If
M is between 0 and 1.0, B is smaller than A. If M is greater than 1.0, B is larger
than A. If METHOD is omitted, IMRESIZE uses nearest neighbor interpolation.
B = IMRESIZE (A,[MROWS MCOLS],METHOD) returns an image of size
MROWS-by-MCOLS. If the specified size does not produce the same aspect ratio
as the input image has, the output image is distorted.
a= imread(‘image.fmt’); % put your image in place of image.fmt.
» B = IMRESIZE (a,[100 100],'nearest');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bilinear');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bicubic');
» imshow(B);

3.7 | EDGE DETECTION
3.7.1 | Canny Edge Detector
1. Low error rate of detection
Well match human perception results
2. Good localization of edges
The distance between actual edges in an image and the edges found by a
computational algorithm should be minimized
3. Single response

34


The algorithm should not return multiple edges pixels when only a single one exist.
3.7.2 | Edge Detectors

bw

color

Canny

sobel

Fig.(3.4)

Fig. (3.5)

3.7.3 | Edge Tracing
b=rgb2gray(a); % convert to gray. WE can only do edge tracing for gray images.
edge(b,'prewitt');
edge(b,'sobel');
edge(b,'sobel','vertical');
edge(b,'sobel','horizontal');
edge(b,'sobel','both');

We can only do edge tracing using gray scale images (i.e images without color).

34


>> BW=rgb2gray (A);
>> edge (BW,’prewitt’)

Fig.(3.6)

That is what I saw!
>> edge (BW,’sobel’,’vertical’)
>> edge (BW,’sobel’,’horizontal’)
>> edge (BW,’sobel’,’both’)
Table(3.6):Data types
Type
Int8
Uint8.
Int16
Double

Description
8-bit integer
8-bit unsigned integer
16-bit integer
Double precision real number

3.8 | MAPPING
3.8.1 | Mapping Images onto Surfaces Overview

33

Range
-128_127
0_255
-32768_32767
Machine specific


Mapping an image onto geometry, also known as texture mapping, involves
overlaying an image or function onto a geometric surface. Images may be realistic,
such as satellite images, or representational, such as color-coded functions of
temperature or elevation. Unlike volume visualizations, which render each voxel
(volume element) of a three-dimensional scene, mapping an image onto geometry
efficiently creates the appearance of complexity by simply layering an image onto
a surface. The resulting realism of the display also provides information that is not
as readily apparent as with a simple display of either the image or the geometric
surface.
Mapping an image onto a geometric surface is a two step process. First, the
image is mapped onto the geometric surface in object space. Second, the surface
undergoes view transformations (relating to the viewpoint of the observer) and is
then displayed in 2D screen space. You can use IDL Direct Graphics or Object
Graphics to display images mapped onto geometric surfaces. The following table
introduces the tasks and routines.
Table(3.7):Tasks and Routines Associated with Mapping an Image onto Geometry
Routine(s)/Object(s)
Description
SHADE_SURF
Display the elevation data
IDLgrWindow::Init
IDLgrView::Init
Initialize the objects necessary for an Object Graphics display.
IDLgrModel::Init
IDLgrSurface:: Init
Initialize a surface object containing the elevation data.
IDLgrImage::Init
Initialize an image object containing the satellite image
XOBJVIEW
Display the object in an interactive IDL utility allowing
rotation and resizing.

3.8.2 | Mapping an Image onto Elevation Data
The following Object Graphics example maps a satellite image from the Los
Angeles, California vicinity onto a DEM (Digital Elevation Model) containing the
areas topographical features. The realism resulting from mapping the image onto
the corresponding elevation data provides a more informative view of the area’s
topography. The process is segmented into the following three sections:
• “Opening Image and Geometry Files”
• “Initializing the IDL Display Objects”
• “Displaying the Image and Geometric Surface Objects”

33


Note:
Data can be either regularly gridded (defined by a 2D array) or irregularly
gridded (defined by irregular x, y, z points). Both the image and elevation data used
in this example are regularly gridded. If you are dealing with irregularly gridded
data, use GRIDDATA to map the data to a regular grid.
Complete the following steps for a detailed description of the process.
Example Code:
See elevation_object.pro in the examples/doc/image subdirectory of the IDL
installation directory for code that duplicates this example. Run the example
procedure by entering elevation object at the IDL command prompt or view the file
in an IDL Editor window by entering .EDIT elevation_object.pro.
Opening Image and Geometry Files:
The following steps read in the satellite image and DEM files and display the
Elevation data.
1. Select the satellite image:
>> imageFile = FILEPATH('elev_t.jpg', $)
SUBDIRECTORY = ['examples', 'data'])

2. Import the JPEG file:
READ_JPEG, image File, image
3. Select the DEM file:
demFile = FILEPATH('elevbin.dat', $)
SUBDIRECTORY = ['examples', 'data'])

4. Define an array for the elevation data, open the file, read in the data and close
the file:
dem = READ_BINARY(demfile, DATA_DIMS = [64, 64]

5. Enlarge the size of the elevation array for display purposes:
dem = CONGRID(dem, 128, 128, /INTERP)

6. To quickly visualize the elevation data before continuing on to the Object
Graphics section, initialize the display, create a window and display the elevation
data using the SHADE_SURF command:
DEVICE, DECOMPOSED = 0

33


WINDOW, 0, TITLE = 'Elevation Data'
SHADE_SURF, dem
After reading in the satellite image and DEM data, continue with the next section
to create the objects necessary to map the satellite image onto the elevation
surface.

Fig.(3.7):Visual Display of the Elevation Data

After reading in the satellite image and DEM data, continue with the next
section to create the objects necessary to map the satellite image onto the elevation
surface.
3.8.3 | Initializing the IDL Display Objects
After reading in the image and surface data in the previous steps, you will
need to create objects containing the data. When creating an IDL Object Graphics
display, it is necessary to create a window object (oWindow), a view object
(oView) and a model object (oModel). These display objects, shown in the
conceptual representation in the following figure, will contain a geometric surface
object (the DEM data) and an image object (the satellite image).
These user-defined objects are instances of existing IDL object classes and
provide access to the properties and methods associated with each object class.

33


Note:
(The XOBJVIEW utility (described in “Mapping an Image Object onto a
Sphere” automatically creates window and view Complete the following steps to
initialize the necessary IDL objects.)
1. Initialize the window, view and model display objects. For detailed syntax,
arguments and keywords available with each object initialization, see
IDLgrWindow::Init, IDLgrView::Init and IDLgrModel::Init. The following
three lines use the basic syntax :
oNewObject = OBJ_NEW('Class_Name')

To create these objects:
oWindow = OBJ_NEW('IDLgrWindow', RETAIN = 2, COLOR_MODEL = 0)
oView = OBJ_NEW('IDLgrView')
oModel = OBJ_NEW('IDLgrModel')

2. Assign the elevation surface data, dem, to an IDLgrSurface object. The
IDLgrSurface::Init keyword, STYLE = 2, draws the elevation data using a filled
line style:
oSurface = OBJ_NEW('IDLgrSurface', dem, STYLE = 2)

3. Assign the satellite image to a user-defined IDLgrImage object using
IDLgrImage::Init:
oImage = OBJ_NEW('IDLgrImage', image, INTERLEAVE = 0, $
/INTERPOLATE)
INTERLEAVE = 0 indicates that the satellite image is organized using pixel
interleaving, and therefore has the dimensions (3, m, n). The INTERPOLATE
keyword forces bilinear interpolation instead of using the default nearest neighbor
interpolation method.
3.8.4 | Displaying the Image and Geometric Surface Objects
This section displays the objects created in the previous steps. The image and
surface objects will first be displayed in an IDL Object Graphics window and then
with the interactive XOBJVIEW utility.

33


1. Center the elevation surface object in the display window. The default object
graphics coordinate system is [–1,–1], [1,1]. To center the object in the window,
position the lower left corner of the surface data at [–0.5,–0.5, –0.5]
for the x, y and z dimensions:
2. Map the satellite image onto the geometric elevation surface using the
IDLgrSurface::Init TEXTURE_MAP keyword:
oSurface -> SetProperty, TEXTURE_MAP = oImage, $
COLOR = [255, 255, 255]
For clearest display of the texture map, set COLOR = [255, 255, 255]. If the image
does not have dimensions that are exact powers of 2, IDL resamples the image into
a larger size that has dimensions which are the next powers of two greater than the
original dimensions. This resampling may cause unwanted sampling artifacts. In
this example, the image does have dimensions that are exact powers of two, so no
resampling occurs.
oSurface -> GETPROPERTY, XRANGE = xr, YRANGE = yr, $
ZRANGE = zr
xs = NORM_COORD(xr)
xs[0] = xs[0] - 0.5
ys = NORM_COORD(yr)
ys[0] = ys[0] - 0.5
zs = NORM_COORD(zr)
zs[0] = zs[0] - 0.5
oSurface -> SETPROPERTY, XCOORD_CONV = xs, $
YCOORD_CONV = ys, ZCOORD = zs

Note:
(If your texture does not have dimensions that are exact powers of 2 and you
do not want to introduce resampling artifacts, you can pad the texture with unused
data to a power of two and tell IDL to map only a subset of the texture onto the
surface.) For example, if your image is 40 by 40, create a 64 by 64 image and fill
part of it with the image data:
textureImage = BYTARR(64, 64, /NOZERO)
textureImage[0:39, 0:39] = image ; image is 40 by 40
oImage = OBJ_NEW('IDLgrImage', textureImage)
Then, construct texture coordinates that map the active part of the texture to a
surface (oSurface):
textureCoords = [[], [], [], []]

33


oSurface -> SetProperty, TEXTURE_COORD = textureCoords
The surface object in IDL 5.6 is has been enhanced to automatically perform
the above calculation. In the above example, just use the image data (the 40 by 40
array) to create the image texture and do not supply texture coordinates. IDL
computes the appropriate texture coordinates to correctly use the 40 by 40 image.
Note:
(Some graphic devices have a limit for the maximum texture size. If your
texture is larger than the maximum size, IDL scales it down into dimensions that
work on the device. This rescaling may introduce resampling artifacts and loss of
detail in the texture. To avoid this, use the TEXTURE_HIGHRES keyword to tell
IDL to draw the surface in smaller pieces that can be texture mapped without loss
of detail.)
3. Add the surface object, covered by the satellite image, to the model object.
Then add the model to the view object:
oModel -> Add, oSurface.
oView -> Add, oMode.
4. Rotate the model for better display in the object window. Without rotating the
model, the surface is displayed at a 90 elevation angle, containing no depth
information. The following lines rotate the model 90 away from the viewer along
the x-axis and 30clockwise along the y-axis and the x-axis:
oModel -> ROTATE, [1, 0, 0], -90
oModel -> ROTATE, [0, 1, 0], 30
oModel -> ROTATE, [1, 0, 0], 30

5. Display the result in the Object Graphics window:
oWindow -> Draw, oView

Fig.(3.9:Image Mapped onto a Surface in an Object Graphics Window

33


6. Display the results using XOBJVIEW, setting the SCALE = 1 (instead of the
default value of 1/SQRT3) to increase the size of the initial display:
XOBJVIEW, oModel, /BLOCK, SCALE = 1
This results in the following display:

Fig.( 3.10) Displaying the Image Mapped onto the Surface in XOBJVIEW

After displaying the model, you can rotate it by clicking in the
applicationwindow and dragging your mouse. Select the magnify button, then click
near the middle of the image. Drag your mouse away from the center of the display
to magnify the image or toward the center of the display to shrink the image. Select
the left-most button on the XOBJVIEW toolbar to reset the display.
7. Destroy unneeded object references after closing the display windows:
OBJ_DESTROY, [oView, oImage]
The oModel and oSurface objects are automatically destroyed when oView is
destroyed.
For an example of mapping an image onto a regular surface using both Direct
and Object Graphics displays, see “Mapping an Image onto a Sphere”

34


3.8.5 | Mapping an Image onto a Sphere
The following example maps an image containing a color representation of
world elevation onto a sphere using both Direct and Object Graphics displays. The
example
is broken down into two sections:
• “Mapping an Image onto a Sphere Using Direct Graphics” .
• “Mapping an Image Object onto a Sphere” .
3.9 | MAPPING OFF LINE:
In the absence of a network or services we can identify and see the track
through the use of image processing technique, We incorporate the map where an
image of the places familiar to the person and determine how to access them and
return them in a clear and safe.
we calculate the distances by using mat lab function :
IMDISTLINE
and assuming speed to calculate time takes to get from one point to another and
we guide person through voice commands for example on the road to move
forward or back word or to left or to right. We have thus, we provide another way
to work mapping without being online.

34

Smart blind stick book

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Smart blind stick book

Similaire à Smart blind stick book (20)

Dernier

Dernier (20)

Smart blind stick book