SlideShare une entreprise Scribd logo
1  sur  199
Télécharger pour lire hors ligne
Mansoura University
Faculty of Engineering
Dept. of Electronics and
Communication Engineering

Smart Blind Stick
A B. Sc. Project in

Electronics and Communications Engineering

Supervised by

Assist. Prof. Mohamed Abdel-Azim
Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashraf

Department of Electronics and Communications Engineering
Faculty of Engineering-Mansoura University

2011-2012
Mansoura University
Faculty of Engineering
Dept. of Electronics and Comm. Engineering

Smart Blind Stick
A B. Sc. Project in

Electronics and Communications Engineering

Supervised by

Assist. Prof. Mohamed Abdel-Azim
Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashrsf

Department of Electronics and Communications Engineering
Faculty of Engineering-Mansoura University

2011-2012
Team Work

Team Work

No.

Name

Contact Information

1

Ahmed Helmy Abd-Ghaffar

Ahmed2033@gmail.com

2

Nesma Zein El-Abdeen Mohammed

eng_nesma.zein@yahoo.com

3

Aya Gamal Osman El-Mansy

eng_tota_20@hotmail.com

4

Fatma Ayman Mohammed

angel_whisper89@hotmail.com

5

Ahmed Moawad Abo-Elenin Awad

ahmedmowd@gmail.com

i
Acknowledgement

Acknowledgement
We would like to express our gratitude to our advisor and supervisor Dr.
Mohammed Abd ElAzim for guiding this work with interest. We would like to
also thank Eng. Ahmed Shaaban and Eng. Mohammed Gamal and Eng. Eman
Ashraf, Teaching Assistance for the countless hours he spent in the labs. We are
grateful to them for setting high standards and giving us the freedom to explore.
We would like to thank our colleagues for the assistance and constant support
provided by them.
Our Team

ii
Acknowledgement

iii
Abstract

Abstract
There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends
from the floor to the person’s sternum. So we'll design ultrasound sensor to
detect all kinds of barriers whatever its shape or height and warn him with
vibration. Blind people also face great problems in moving from place to
another in the town and the only way for them is Guide dogs which can cost
about $20, 000 and they can be useful for about 5 – 6 years.
So we'll design GPS for blind people which help him in moving from place
to another in the town with voice orders for directions and he'll identify the
place he want to go with voice only and not need to type any thing.
But we want also to help him in moving indoor or in closed places he goes
daily from place to another we'll design an indoor navigation system depend on
working off line to help him to move from location to another in specific places
home, moles, libraries...Etc. also by voice orders .
He may face a great problem in control his electric devices we'll design for
him a total wireless control system to easily control all his electric devices by
voice connected to a security system to warn him if he indoor or out if any thing
wrong happen and help him to solve this problem .

iv
Contents

Chapter-01: Introduc on………………………………………………………………………………………………..
1.1
Problem Definition …………………………………………………………………………………......
1.2
Problem Solution ………………………………………………………………………………………….
1.3
Business Model …………………………………………………………………………………………….
1.4
Block Diagram……………………………………………………………………………………………….
1.5
Detailed Technical Description ……………………….……………………………………………
1.6
Pre-Project Planning….………………………………………………………………………………….
1.7
Time Planning……………………………………………………………………………………………….
Chapter-02: Speech recognition …………………………………………………………………………………………
2.1
Introduction …………………………………………………………………………………………………
2.2
Literature review ………………………………………………………………………………………….
2.2.1
Pattern recognition ……………………………………………………….
2.2.2
Generation of voice ………………………………………………………
2.2.3
Voice as biometric …………………………………………………………
2.2.4
Speech recognition ……………………………………………………….
2.2.5
Speaker recognition ………………………………………………………
2.2.6
Speechspeaker modeling ……………………………………………..
2.3
Implementation details ………………………………………………………………………………..
2.3.1
Pre-processing and feature extraction ……………………………
2.4
Artificial neural network…………………………………………………….............................
2.4.1
Introduction …………………………………………………………………..
2.4.2
Models …………………………………………………………………………..
2.4.3
Network function …………………………………………………………...
2.4.4
Ann dependency graph …………………………………………………..
2.4.5
Learning ………………………………………………………………………….
2.4.6
Choosing a cost function ………………………………………………..
2.4.7
Learning paradigms ………………………………………………………..
2.4.8
Supervised learning ………………………………………………………..
2.4.9
unsupervised learning …………………………………………………….
2.4.10
Reinforcement learning ………………………………………………….
2.4.11
Learning algorithms…………………………………………………………
2.4.12
Employing artificial neural network ………………………………..
2.4.13
Application ……………………………………………………………………..
2.4.14
Types of models ……………………………………………………………..
2.4.15
Neural network software ……………………………………………….
2.4.16
Types of artificial neural network …………………………………..
2.4.17
Confidence analysis of neural network …………………………..
Chapter-03: Image Processing ………….……………………………………………………………………………..
3.1
Introduction ………………………………………………………………………………………………….
3.1.1
What is digital image processing? ......................................
3.1.2
Motivating problems ………………………………………………………
3.2
Color vision …………………………………………………………………………………………………..
3.2.1
Fundamentals …………………………………………………………………
3.2.2
Image formats supported by mat lab ……………………………..
3.2.3
Working formats in mat lab ……………………………………………
3.3
Aspects of image processing ………………………………………………………………………..

ii

1
1
1
2
2
3
4
4
7
7
7
7
9
11
11
12
13
13
13
22
22
23
24
24
25
26
26
26
27
27
28
28
29
30
31
31
31
32
33
33
33
34
34
35
35
35
Contents

3.4

Image types ………………………………………………………………………………………………….
3.4.1
Intensity image ………………………………………………………………
3.4.2
Binary image ………………………………………………………………….
3.4.3
Indexed image ……………………………………………………………….
3.4.4
RGB image………………………………………………………………………
3.4.5
Multi frame image ………………………………………………………….
3.5
How to ………………………………………………………………………………………………………….
3.5.1
How to convert between different formats ……………………
3.5.2
How to read file ……………………………………………………………..
3.5.3
Loading and saving variables in mat lab ………………………….
3.5.4
How to display an image in mat lab ………………………………..
3.6
Some important definitions ………………………………………………………………………….
3.6.1
Imread function ……………………………………………………………..
3.6.2
Rotation …………………………………………………………………………
3.6.3
Scaling ……………………………………………………………………………
3.6.4
Interpolation ………………………………………………………………….
3.7
Edge detection ……………………………………………………………………………………………..
3.7.1
Canny edge detection …………………………………………………….
3.7.2
Edge tracing ……………………………………………………………………
3.8
Mapping ……………………………………………………………………………………………………….
3.8.1
Mapping image onto surface overview …………………………..
3.8.2
Mapping an image onto elevation data ………………………….
3.8.3
Initializing the IDL display objects……………………………………
3.8.4
Displaying image and geometric surface object………………
3.8.5
Mapping an image onto sphere………………………………………
3.9
Mapping offline…………………………………………………………………………………………….
Chapter-04: GPS naviga on…………………………………………………………………………………………..
4.1
Introduction …………………………………………………………………………………………………
4.1.1
What is GPS ?......................................................................
4.1.2
How it work ?......................................................................
4.2
Basic concepts of GPS …………………………………………………………………………………..
4.3
Position calculation ………………………………………………………………………………………
4.4
Communication ……………………………………………………………………………………………
4.5
Message format …………………………………………………………………………………………..
4.6
Satellite frequencies …………………………………………………………………………………….
4.7
Navigation equations …………………………………………………………………………………..
4.8
Bancroft's method ………………………………………………………………………………………..
4.9
Trilateration ………………………………………………………………………………………………….
4.10
Multidimensional Newton-Raphson calculation ………………………………………….
4.11
Additional method for more than four satellites …………………………..................
4.12
Error sources and analysis ……………………………………………………………………………
4.13
Accuracy enhancement and surveying …………………………………………………………
4.13.1
Augmentation…………………………………………………………………
4.13.2
Precise monitoring………………………………………………………….
4.14
Time keeping ……………………………………………………………………………………………….
4.14.1
Time keeping and leap seconds ……………………………………..

iii

36
36
37
37
37
37
38
38
38
39
39
40
40
40
41
41
41
41
42
43
43
44
46
47
51
51
53
53
53
53
54
55
57
57
58
59
60
60
60
61
61
61
61
62
63
63
Contents

4.14.2
Time keeping accuracy ……………………………………………………
4.14.3
Time keeping format……………………………………………………….
4.14.4
Carrier phase tracking …………………………………………………….
4.15
GPS navigation ……………………………………………………………………………………………..
Chapter-05: Ultrasound …………………………………………………………………………………………….
5.1
Introduction ………………………………………………………………………………………………….
5.1.1
History …………………………………………………………………………….
5.2
Wave motion ………………………………………………………………………………………………..
5.3
Wave characteristics …………………………………………………………………………………….
5.4
Ultrasound intensity ……………………………………………………………………………………..
5.5
Ultrasound velocity ………………………………………………………………………………………
5.6
Attenuation of ultrasound ……………………………………………………………………………
5.7
Reflection …………………………………………………………………………………………………….
5.8
Refraction …………………………………………………………………………………………………….
5.9
Absorption ………………………………………………………………………………………………….
5.10
Hardware part ……………………………………………………………………………………
5.10.1
Introduction …………………………………………………………..
5.10.2
Calculating the distance………………………………………….
5.10.3
Changing beam pattern and beam width ………………….
5.10.4
The development of the sensor…………………………………
Chapter-06: Microcontroller ……………………………………………………………………………………….
6.1
Introduction ……………………………………………………………………………………..
6.1.1
History of microcontroller ………………………………………
6.1.2
Embedded design……………………………………………………..
6.1.3
Interrupt ………………………………………………………………….
6.1.4
Programs …………………………………………………………………
6.1.5
Other microcontroller feature ………………………………..
6.1.6
Higher integration …………………………………………………….
6.1.7
Programming environment ………………………………………
6.2
Types of micro controller ………………………………………………………………….
6.2.1
Interrupt latency ……………………………………………………….
6.3
Microcontroller embedded memory technology …………………………
6.3.1
Data………………………………………………………………………..
6.3.2
Firmware …………………………………………………………………
6.4
PIC microcontroller …………………………………………………………………………..
6.4.1
Family core architecture ………………………………………..
6.5
PIC component …………………………………………………………………………………..
6.5.1
Logic circuit ………………………………………………………………
6.5.2
Power supply ……………………………………………………………
6.6
Development tools……………………………………………………………………………
6.6.1
Device programs ……………………………………………………..
6.6.2
Debugging ……………………………………………………………….
6.7
LCD display ………………………………………………………………………………………..
6.7.1
LCD display pins ……………………………………………………….
6.7.2
LCD screen ………………………………………………………………
6.7.3
LCD memory …………………………………………………………..

iv

63
64
64
66
69
69
69
69
71
72
75
76
77
79
81
83
83
87
87
88
91
91
92
93
93
94
94
95
97
98
99
100
100
101
101
101
101
106
119
127
127
128
130
131
131
132
Contents

6.7.4
LCD basic command …………………………………………………..
6.7.5
LCD connecting ………………………………………………………….
6.7.6
LCD initialization ………………………………………………………
Chapter-07: System Implementa on …………………………………………………………………………
7.1
Introduction ………………………………………………………………………………………
7.2
Survey………………………………………………………………………………………………..
7.3
Searches ……………………………………………………………………………………………
7.3.1
Ultra sound sensor…………………………………………………….
7.3.2
Indoor navigation systems ……………………………………….
7.3.3
Outdoor navigation …………………………………………………
7.4
Sponsors ………………………………………………………………………………………..
7.5
Pre-design ……………………………………………………………………………………….
7.5.1
List of matrices ……………………………………………………….
7.5.2
Competitive Benchmarking Information…………

136
138
139
141
141
142
142
142
142
142
143
143
144
145

7.5.3
Ideal and marginally acceptable target values ………..
7.5.4
Time plan diagram ……………………………………………………
7.6
Design ………………………………………………………………………………………………
7.6.1
Speech recognition ………………………………………………..
7.6.2
Ultra sensors ……………………………………………………………
7.6.3
Outdoor navigation …………………………………………………
7.7
Product architecture ………………………………………………………………………
7.7.1
Product schematic …………………………………………………..
7.7.2
Rough geometric layout ………………………………………….
7.7.3
Incidental interactions ……………………………………………..
7.8
Defining secondary system ……………………………………………………………..
7.9
Detailed interface specification ………………………………………………………
7.10
Establishing the architecture of the chunks ………………………………………
Chapter-08: conclusion ……………………………………………………………………………………………….
8.1
Introduction………………………………………………………………………………….
8.2
Overview…………………………………………………………………………………………..
8.2.1
Outdoor navigation ……………………………………………………
8.2.1
8.2.1.1
Outdoor navigation online ………………………………………
8.2.1.2
Outdoor navigation offline ……………………………………….
8.2
8.2.2
Ultrasound sensor ……………………………………………………..
8.2.3
Object identifier ……………………………………………………….
8.3
Features …………………………………………………………………………………………….

146
146
147
147
149
150
151
151
152
153
154
154
155
157
158
158
158
158
158
159
159
159

v
CHAPTER 1
Introduction
Chapter 1 | Introduction

1.1 | PROBLEM DEFINITION
There is approximately 36.9 million people in the world are blind in 2002
according to World Health Organization. Majority of them are using a
conventional white cane to aid in navigation. The limitation in white cane is that
the information’s are gained by touching the objects by the tip of the cane. The
traditional length of a white cane depends on the height of user and it extends from
the floor to the person’s sternum.
Blind people also face great problems in moving from place to another in the
town and the only way for them is Guide dogs which can cost about $20, 000 and
they can be useful for about 5 – 6 years. They also have a great problem to identify
the objects he frequently used in his house as kitchen tools and clothes. And also
he may face a great problem in control his electric devices or have a security
problem and he can't face it.
1.2 | PROBLEM SOLUTION
All previous problems we're trying to solve them. To help the user moving
easily indoor and outdoor we'll use ultrasound sensor to detect the barriers on his
way and alert him by 2 ways vibration motor which speed increases when the
distance decreases and voice alert told him the distance between him and the
barrier.
To solve the problem of moving outside home from place to another we'll
design a software to be used in smart phones to help him in moving from place to
another with voice orders without any external help he just say the place he want to
go then the phone will guide him with voice orders to arrive this place. To help
him to identify the objects we'll use RFID every important object will have tag or
id when the reader read the id it will told him what it is by voice. Inside the home
we'll design a system to control all electronic devices by voice orders and also a
security system designed especially for them the most important in it is the fire
alarm when it detects a fire it will alert him by a call to his mobile phone and
another call to his friends near him for help and also a security system to warn him
if he forget to close his door. After finishing these applications we're going to make
features after graduation by adding new technologies to help him moving in the
street easier and help him crossing roads and reading books. The products in our
market in Egypt for them don't cover any needs for them.

1
Chapter 1 | Introduction

The blind needs to move control and do his tasks his self without any help
from anybody. There’s just a white stick without any technologies or features. So
finally we'll install on the white stick a sensor and RFID and the other part is a
software part on the mobile to do the navigation and automation tasks.
1.3 | BUSINESS MODEL
Our customers are blind people and a visually impaired person there's almost
1 million people in Egypt has one of the past problems.
Our product would cover some needs of our customers as helping them to avoid
the barriers on their way and guide them with voice to the direction they must go to
avoid this and also help the to move free without any external help in different
countries by android application on his mobile which designed especially for them
to guide them with voice through roads and tell them the direction they have to go
to arrive their goal.
To reach our goal we met with different customers to know exactly what they
need and help us to get a vision for our final product to be comfortable and also we
were guided technically by our sponsors to find the best way to cover all these
needs.
In our market the available products doesn't cover any needs we just found a white
stick without any technologies to help the user.
1.4 | BLOCK DIAGRAMS

Fig.(1.1): General Project Block Diagram

2
Chapter 1 | Introduction

1.5 DETAILED TECHNICAL DESCRIPTION
Our project was built on the simplest available technologies to reach our goal
in the way that comfort the user so we divided our project into 2 parts software and
hardware.
The hardware part consists of MCU pic, MP3 module, cam module and
ultrasound sensor module.
The software part is an android application available to be installed on the
mobile.
In the hardware part there're 2 conditions for it indoor and outdoor.
For indoor only one sensor will measure ranges and cam module will take a photo
to the object when the user reaches 2 cm to detect the code put on and send it to
MCU which processing it and identify the code number and then get the object
name from database and then connect the mp3 module WT588D and get the mp3
file address which contains the name of it and out to from the speaker.
For outdoor 3 sensors HC-SR04 sensors will be activated in 3 direction to
determine the best way no barriers on it and send measured data to MCU and the
MCU detect the best way and send the address of the mp3 which contains the
wanted direction and it would be the output.
For navigation outdoor we'll design android application using Google maps
the user detect the place he want to go with voice and the application detect his
current position using GPS and the digital compass detect the angel of view and
guide him to the direction using GPS data and compass data.

Choose Mode

Left Button

Right Button

Outdoor

Indoor

Fig. (1.2): Button Configuration

3
Chapter 1 | Introduction

Fig. (1.3): Indoor & Outdoor Processes Block Diagram

1.6 PRE-PROJECT PLANNING
We start searching for a problem no one care it and we found blinds'
problems take no care to be solved and available products in Egypt aren't found. So
we found it's a good field to start in it to get an opportunity to solve a problem and
also enter a new field in the market with low number of Competitors.
1.7 TIME PLANNING
Project Timing:
The three main parts are individual in execution time but each part has many
branches which are series in execution time.
Timing of Product Introductions:
The timing of launching the product is dependent on the marketing and the
market studying again to the products which must be having low cost and high
quality.

4
Chapter 1 | Introduction

Technology Readiness:
One of the fundamental components in the product is technology because the
Android and Ultrasonic technology are taking good importance between the
Egyptian customers.
Market Readiness:
The market always has a readiness to any new product the market is common
between products to give the customers the best one for them.
The Product Plan:
This plan makes the project comfortable in his implementation because
anything arranged or planned to do give the best results.

5
CHAPTER 2

Speech Recognition
Chapter 2 | Speech Recognition

2.1 | INTRODUCTION
Biometrics is, in the simplest definition, something you are. It is a physical
characteristic unique to each individual such as fingerprint, retina, iris, speech.
Biometrics has a very useful application in security; it can be used to authenticate a
person’s identity and control access to a restricted area, based on the premise that
the set of these physical characteristics can be used to uniquely identify
individuals. Speech signal conveys two important types of information, the
primarily the speech content and on the secondary level, the speaker identity.
Speech recognizers aim to extract the lexical information from the speech signal
independently of the speaker by reducing the inter-speaker variability. On the other
hand, speaker recognition is concerned with extracting the identity of the person
speaking the utterance. So both speech recognition and speaker recognition system
is possible from same voice input.
We use in our project the speech recognition technique because we want in
our project to recognize the word that the stick will make action depending on this
word.
Mel Filter Cepstral Coefficient (MFCC) is used as feature for both speech and
speaker recognition. We also combined energy features and delta and delta-delta
features of energy and MFCC. After calculating feature, neural networks are used
to model the speech recognition. Based on the speech model the system decides
whether or not the uttered speech matches what was prompted to utter.
2.2 | LITERATURE REVIEW
2.2.1 | Pattern Recognition
Pattern recognition, one of the branches of artificial intelligence, sub-section
of machine learning, is the study of how machines can observe the environment,
learn to distinguish patterns of interest from their background, and make sound and
reasonable decisions about the categories of the patterns. A pattern can be a
fingerprint image, a handwritten cursive word, a human face, or a speech signal,
sales pattern etc…
The applications of pattern recognition include data mining, document
classification, financial forecasting, organization and retrieval of multimedia
databases, and biometrics (personal identification based on various physical
attributes such as face, retina, speech, ear and fingerprints).The essential steps of

7
Chapter 2 | Speech Recognition

pattern recognition are: Data Acquisition, Preprocessing, Feature Extraction,
Training and Classification.
Features are used to denote the descriptor. Features must be selected so that
they are discriminative and invariant. They can be represented as a vector, matrix,
tree, graph, or string.
They are ideally similar for objects in the same class and very different for
objects indifferent class. Pattern class is a family of patterns that share some
common properties. Pattern recognition by machine involves techniques for
assigning patterns to their respective classes automatically and with as little human
intervention as possible.
Learning and Classification usually use one of the following approaches:
Statistical Pattern Recognition is based on statistical characterizations of patterns,
assuming that the patterns are generated by a probabilistic system. Syntactical (or
Structural) Pattern Recognition is based on the structural interrelationships of
features. Given a pattern, its recognition/classification may consist of one of the
following two tasks according to the type of learning procedure:
1) Supervised Classification (e.g., Discriminant Analysis) in which the input pattern
is identified as a member of a predefined class.
2) Unsupervised Classification (e.g., clustering) in which the pattern is assigned to
a previously unknown class.

Fig. (2.1): General block diagram of pattern recognition system

8
Chapter 2 | Speech Recognition

2.2.2 | Generation of Voice
Speech begins with the generation of an airstream, usually by the lungs and
diaphragm -process called initiation. This air then passes through the larynx tube,
where it is modulated by the glottis (vocal chords). This step is called phonation or
voicing, and is responsible fourth generation of pitch and tone. Finally, the
modulated air is filtered by the mouth, nose, and throat - a process called
articulation - and the resultant pressure wave excites the air.

Fig. (2.2): Vocal Schematic

Depending upon the positions of the various articulators different sounds are
produced. Position of articulators can be modeled by linear time- invariant system
that has frequency response characterized by several peaks called formants. The
change in frequency of formants characterizes the phoneme being articulated.
As a consequence of this physiology, we can notice several characteristics of
the frequency domain spectrum of speech. First of all, the oscillation of the glottis

9
Chapter 2 | Speech Recognition

results in an underlying fundamental frequency and a series of harmonics at
multiples of this fundamental. This is shown in the figure below, where we have
plotted a brief audio waveform for the phoneme /i: / and its magnitude spectrum.
The fundamental frequency (180 Hz) and its harmonics appear as spikes in the
spectrum. The location of the fundamental frequency is speaker dependent, and is a
function of the dimensions and tension of the vocal chords. For adults it usually
falls between 100 Hz and 250 Hz, and females‟ average significantly higher than
that of males.

Fig. (2.3): Audio Sample for /i: / phoneme showing stationary property of phonemes for a short period

The sound comes out in phonemes which are the building blocks of speech.
Each phoneme resonates at a fundamental frequency and harmonics of it and thus
has high energy at those frequencies in other words have different formats. It is the
feature that enables the identification of each phoneme at the recognition stage.
The variations in

Fig.(2.4): Audio Magnitude Spectrum for /i:/ phoneme showing fundamental frequency and its harmonics

10
Chapter 2 | Speech Recognition

Inter-speaker features of speech signal during utterance of a word are modeled in
word training in speech recognition. And for speaker recognition the intra-speaker
variations in features in long speech content is modeled.
Besides the configuration of articulators, the acoustic manifestation of a phoneme
is affected by:
 Physiology and emotional state of speaker.
 Phonetic context.
 Accent.
2.2.3 | Voice as Biometric
The underlying premise for voice authentication is that each person’s voice
differs in pitch, tone, and volume enough to make it uniquely distinguishable.
Several factors contribute to this uniqueness: size and shape of the mouth, throat,
nose, and teeth (articulators) and the size, shape, and tension of the vocal cords.
The chance that all of these are exactly the same in any two people is very low.
Voice Biometric has following advantages from other form of biometrics:
 Natural signal to produce
 Implementation cost is low since, doesn’t require specialized input device
 Acceptable by user
Easily mixed with other form of authentication system for multifactor
authentication only biometric that allows users to authenticate remotely.
2.2.4 | Speech Recognition
Speech is the dominant means for communication between humans, and
promises to be important for communication between humans and machines, if it
can just be made a little more reliable.
Speech recognition is the process of converting an acoustic signal to a set of
words. The applications include voice commands and control, data entry, voice
user interface, automating the telephone operator’s job in telephony, etc. They can
also serve as the input to natural language processing. There is two variant of
speech recognition based on the duration of speech signal:
Isolated word recognition, in which each word is surrounded by some sort of
pause, is much easier than recognizing continuous speech, in which words run into
each other and have to be segmented. Speech recognition is a difficult task because

11
Chapter 2 | Speech Recognition

of the many source of variability associated with the signal such as the acoustic
realizations of phonemes, the smallest sound units of which words are composed,
are highly dependent on the context. Acoustic variability can result from changes
in the environment as well as in the position and characteristics of the transducer.
Third, within speaker variability can result from changes in the speaker's physical
and emotional state, speaking rate, or voice quality. Finally, differences in socio
linguistic background, dialect, and vocal tract size and shape can contribute to
cross-speaker variability. Such variability is modeled in various ways. At the level
of signal representation, the representation that emphasizes the speaker
independent features is developed.
2.2.5 | Speaker Recognition
Speaker recognition is the process of automatically recognizing who is
speaking on the basis of individual’s information included in speech waves.
Speaker recognition can be classified into identification and verification. Speaker
recognition has been applied most often as means of biometric authentication.
2.2.5.1 | Types of Speaker Recognition
Speaker Identification
Speaker identification is the process of determining which registered speaker
provides a given utterance. In Speaker Identification (SID) system, no identity
claim is provided, the test utterance is scored against a set of known (registered)
references for each potential speaker and the one whose model best matches the
test utterance is selected. There is two types of speaker identification task closedset and open-set speaker identification .In closed-set, the test utterance belongs to
one of the registered speakers.
During testing, a matching score is estimated for each registered speaker. The
speaker corresponding to the model with the best matching score is selected. This
requires N comparisons for a population of N speakers. In open-set, any speaker
can access the system; those who are not registered should be rejected. This
requires another model referred to as garbage model or imposter model or
background model, which is trained with data provided by other speakers different
from the registered speakers.
During testing, the matching score corresponding to the best speaker model is
compared with the matching score estimated using the garbage model. In order to
accept or reject the speaker, making the total number of comparisons equal to N +

12
Chapter 2 | Speech Recognition

1. Speaker identification performance tends to decrease as the population size
increases.
Speaker verification
Speaker verification, on the other hand, is the process of accepting or
rejecting the identity claim of a speaker. That is, the goal is to automatically accept
or reject an identity that is claimed by the speaker. During testing, a verification
score is estimated using the claimed speaker model and the anti-speaker model.
This verification score is then compared to a threshold. If the score is higher than
the threshold, the speaker is accepted, otherwise, the speaker is rejected.
Thus, speaker verification, involves a hypothesis test requiring a simple
binary decision: accept or reject the claimed identity regardless of the population
size. Hence, the performance is quite independent of the population size, but it
depends on the number of test utterances used to evaluate the performance of the
system.
2.2.6 | Speaker/Speech Modeling
There are various pattern modeling/matching techniques. They include
Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Hidden
Markov Modeling (HMM), Artificial Neural Network (ANN), and Vector
Quantization (VQ). These are interchangeably used for speech, speaker modeling.
The best approach is statistical learning methods: GMM for Speaker Recognition,
which models the variations in features of a speaker for a long sequence of
utterance.
And another statistical method widely used for speech recognition is HMM.
HMM models the Markovian nature of speech signal where each phoneme
represents a state and sequence of such phonemes represents a word. Sequence of
Features of such phonemes from different speakers is modeled by HMM.
2.3 | IMPLEMENTATION DETAILS
The implementation of system includes common pre-processing and feature
extraction module, speaker independent speech modeling and classification by
ANNs.
2.3.1 | Pre-Processing and Feature Extraction

13
Chapter 2 | Speech Recognition

Starting from the capturing of audio signal, feature extraction consists of the
following steps as shown in the block diagram below:
Speech
Signal

Silence
removal

Preemphasis

Framing

Windowing

DFT

Mel Filter
Bank

Log

IDF
T
CMS

Energy

12MFCC
12 ΔMFCC
12 ΔΔ MFCC

Delta
1 energy
1 Δ energy
1 ΔΔ energy

Fig. (2.5): Pre-Processing and Feature Extraction

2.3.1.1 | Capture







The first step in processing speech is to convert the analog representation
(first air pressure, and then analog electric signals in a microphone) into a digital
signal x[n], where n is an index over time. Analysis of the audio spectrum shows
that nearly all energy resides in the band between DC and 4 kHz, and beyond 10
kHz there is virtually no energy what so ever.
Used sound format:
22050 Hz
16-bits, Signed
Little Endian
Mono Channel
Uncompressed PCM
2.3.1.2 | End point detection and Silence removal
The captured audio signal may contain silence at different positions such as
beginning of signal, in between the words of a sentence, end of signal…. etc. If
silent frames are included, modeling resources are spent on parts of the signal
which do not contribute to the identification. The silence present must be removed
before further processing. There are several ways for doing this: most popular are
Short Time Energy and Zeros Crossing Rate. But they have their own limitation
regarding setting thresholds as an ad hocbasis. The algorithm we used uses

14
Chapter 2 | Speech Recognition

statistical properties of background noise as well as physiological aspect of speech
production and does not assume any ad hoc threshold.
It assumes that background noise present in the utterances is Gaussian in
nature. Usually first 200msec or more (we used 4410 samples for the sampling rate
22050samples/sec) of a speech recording corresponds to silence (or background
noise) because the speaker takes some time to read when recording starts.
Endpoint Detection Algorithm:
Step 1:
Calculate the mean (μ) and standard deviation (σ) of the first 200ms samples
of the given utterance. The background noise is characterized by this μ and σ.
Step 2:
Go from 1st sample to the last sample of the speech recording. In each
sample, check whether one-dimensional Mahalanobis distance functions i.e. | x-μ |/
σ greater than 3 or not. If Mahalanobis distance function is greater than 3, the
sample is to be treated as voiced sample otherwise it is an unvoiced/silence. The
threshold reject the samples up to 99.7% as per given by P [|x−μ|≤3σ] =0.997 in a
Gaussian distribution thus accepting only the voiced samples.
Step 3:
Mark the voiced sample as 1 and unvoiced sample as 0. Divide the whole
speech signal into 10 ms non-overlapping windows. Represent the complete speech
by only zeros and ones.
Step 4:
Consider there are M number of zeros and N number of ones in a window. If
M ≥ N then convert each of ones to zeros and vice versa. This method adopted here
keeping in mind that a speech production system consisting of vocal cord, tongue,
vocal tract etc. cannot change abruptly in a short period of time window taken here
as 10ms.
Step 5:
Collect the voiced part only according to the labeled „1‟ samples from the
windowed array and dump it in a new array. Retrieve the voiced part of the
original speech signal from labeled 1 sample.

15
Chapter 2 | Speech Recognition

Fig. (2.6): Input signal to End-point detection system

Fig. (2.7): Output signal from End point Detection System

2.3.1.3 | PCM Normalization
The extracted pulse code modulated values of amplitude is normalized, to
avoid amplitude variation during capturing.
2.3.1.4 | Pre-emphasis
Usually speech signal is pre-emphasized before any further processing, if we
look at the spectrum for voiced segments like vowels, there is more energy at
lower frequencies than the higher frequencies. This drop in energy across
frequencies is caused by the nature of the glottal pulse. Boosting the high
frequency energy makes information from these higher formants more available to
the acoustic model and improves phone detection accuracy. The pre-emphasis filter
is a first-order high-pass filter. In the time domain, with input x[n]and 0.9 ≤ α ≤
1.0, the filter equation is:
y[n] = x[n]− α x[n−1]
We used α=0.95.

16
Chapter 2 | Speech Recognition

Fig. (2.8): Signal before Pre-Emphasis

Fig.(2.9): Signal after Pre-Emphasis

2.3.1.5 | Framing and windowing
Speech is a non-stationary signal, meaning that its statistical properties are not
constant across time. Instead, we want to extract spectral features from a small
window of speech that characterizes a particular sub phone and for which we can
make the (rough) assumption that the signal is stationary (i.e. its statistical
properties are constant within this region).We used frame block of 23.22ms with
50% overlapping i.e., 512 samples per frame.

17
Chapter 2 | Speech Recognition

Fig.(2.10): Frame Blocking of the Signal

The rectangular window (i.e., no window) can cause problems, when we do
Fourier analysis; it abruptly cuts of the signal at its boundaries. A good window
function has a narrow main lobe and low side lobe levels in their transfer functions,
which shrinks the values of the signal toward zero at the window boundaries,
avoiding discontinuities. The most commonly used window function in speech
processing is the Hamming window defined as follows:
(
)
( )
{
(
)}

Fig.(2.11): Hamming window

The extraction of the signal takes place by multiplying the value of the signal
at time n, s frame [n], with the value of the window at time n, S w [n]:
Y[n] = Sw[n] × Sframe[n]

18
Chapter 2 | Speech Recognition

Fig.(2.12): A single frame before and after windowing

2.3.1.6 | Discrete Fourier Transform
A Discrete Fourier Transform (DFT) of the windowed signal is used to extract
the frequency content (the spectrum) of the current frame. The tool for extracting
spectral information i.e., how much energy the signal contains at discrete
frequency bands for a discrete-time (sampled) signal is the Discrete Fourier
Transform or DFT. The input to the DFT is a windowed signal x[n]...x[m], and the
output, for each of N discrete frequency bands, is a complex number X[k]
representing the magnitude and phase of that frequency component in the original
signal.
|∑

( )

(

)

|

The commonly used algorithm for computing the DFT is the Fast Fourier
Transform or in short FFT.

2.3.1.7 | Mel Filter
For calculating the MFCC, first, a transformation is applied according to the
following formula:
( )

[

]

Where, x is the linear frequency. Then, a filter bank is applied to the
amplitude of the Mel-scaled spectrum. The Mel frequency warping is most
conveniently done by utilizing a filter bank with filters centered according to Mel

19
Chapter 2 | Speech Recognition

frequencies. The width of the triangular filters varies according to the Mel scale, so
that the log total energy in a critical band around the center frequency is included.
The centers of the filters are uniformly spaced in the Mel scale.

Fig.(2.13): Equally spaced Mel values

The result of Mel filter is information about distribution of energy at each Mel
scale band. We obtain a vector of outputs (12 coeffs.) from each filter.

Fig.(2.13): Triangular filter bank in frequency scale

We have used 30 filters in the filter bank.

20
Chapter 2 | Speech Recognition

2.3.1.8 | Cestrum by Inverse Discrete Fourier Transform
Cestrum transform is applied to the filter outputs in order to obtain MFCC
feature of each frame. The triangular filter outputs Y (i), i=0, 1, 2… M are
compressed using logarithm, and discrete cosine transform (DCT) is applied. Here,
M is equal to number of filters in filter bank i.e., 30.
[ ]

∑

()

[

(

)]

Where, C[n] is the MFCC vector for each frame.
The resulting vector is called the Mel-frequency cepstrum (MFC), and the
individual components are the Mel-frequency Cepstral coefficients (MFCCs). We
extracted 12 features from each speech frame.
2.3.1.9 | Post Processing
Cepstral Mean Subtraction (CMS)
A speech signal may be subjected to some channel noise when recorded, also
referred to as the channel effect. A problem arises if the channel effect when
recording training data for a given person is different from the channel effect in
later recordings when the person uses the system. The problem is that a false
distance between the training data and newly recorded data is introduced due to the
different channel effects. The channel effect is eliminated by subtracting the Melcepstrum coefficients with the mean Mel-cepstrum coefficients:
( )

( )

∑ ( )

The energy feature
The energy in a frame is the sum over time of the power of the samples in the
frame; thus for a signal x in a window from time sample t1 to time sample t2 the
energy is:
∑

[ ]

Delta feature
Another interesting fact about the speech signal is that it is not constant from
frame to frame. Co-articulation (influence of a speech sound during another

21
Chapter 2 | Speech Recognition

adjacent or nearby speech sound) can provide a useful cue for phone identity. It
can be preserved by using delta features. Velocity (delta) and acceleration (delta
delta) coefficients are usually obtained from the static window based information.
This delta and delta delta coefficients model the speed and acceleration of the
variation of Cepstral feature vectors across adjacent windows. A simple way to
compute deltas would be just to compute the difference between frames; thus the
delta value d(t ) for a particular Cepstral value c (t) at time t can be estimated as:
( )
[]
[]
[]
The differentiating method is simple, but since it acts as a high-pass filtering
operation on the parameter domain, it tends to amplify noise. The solution to this is
linear regression, i.e. first-order polynomial, the least squares solution is easily
shown to be of the following form:
[]
∑
[]
∑
Where, M is regression window size. We used M=4.








Composition of Feature Vector
We calculated 39 Features from each frame:
12 MFCC Features.
12 Deltas MFCC.
12 Delta-Deltas MFCC.
1 Energy Feature.
1 Delta Energy Feature.
1 Delta-Delta Energy Feature.
2.4 | ARTIFICIAL NEURAL NETWORKS
2.4.1 | Introduction
We have used ANNs to model our system and train voices and test it to
classify it into words categories which return actions. And here we will make an
overview about artificial neural networks.
The original inspiration for the term Artificial Neural Network came from
examination of central nervous systems and their neurons, axons, dendrites, and
synapses, which constitute the processing elements of biological neural networks
investigated by neuroscience. In an artificial neural network, simple artificial
nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or

22
Chapter 2 | Speech Recognition

"units", are connected together to form a network of nodes mimicking the
biological neural networks — hence the term "artificial neural network".
Because neuroscience is still full of unanswered questions, and since there are
many levels of abstraction and therefore many ways to take inspiration from the
brain, there is no single formal definition of what an artificial neural network is.
Generally, it involves a network of simple processing elements that exhibit
complex global behavior determined by connections between processing elements
and element parameters. While an artificial neural network does not have to be
adaptive per se, its practical use comes with algorithms designed to alter the
strength (weights) of the connections in the network to produce a desired signal
flow.
These networks are also similar to the biological neural networks in the sense
that functions are performed collectively and in parallel by the units, rather than
there being a clear delineation of subtasks to which various units are assigned (see
also connectionism). Currently, the term Artificial Neural Network (ANN) tends to
refer mostly to neural network models employed in statistics, cognitive psychology
and artificial intelligence. Neural network models designed with emulation of the
central nervous system (CNS) in mind are a subject of theoretical neuroscience and
computational neuroscience.
In modern software implementations of artificial neural networks, the
approach inspired by biology has been largely abandoned for a more practical
approach based on statistics and signal processing. In some of these systems,
neural networks or parts of neural networks (such as artificial neurons) are used as
components in larger systems that combine both adaptive and non-adaptive
elements. While the more general approach of such adaptive systems is more
suitable for real-world problem solving, it has far less to do with the traditional
artificial intelligence connectionist models. What they do have in common,
however, is the principle of non-linear, distributed, parallel and local processing
and adaptation. Historically, the use of neural networks models marked a paradigm
shift in the late eighties from high-level (symbolic) artificial intelligence,
characterized by expert systems with knowledge embodied in if-then rules, to lowlevel (sub-symbolic) machine learning, characterized by knowledge embodied in
the parameters of a dynamical system.
2.4.2 | Models

23
Chapter 2 | Speech Recognition

Neural network models in artificial intelligence are usually referred to as
artificial neural networks (ANNs); these are essentially simple mathematical
models defining a function or a distribution over or both and , but sometimes
models are also intimately associated with a particular learning algorithm or
learning rule. A common use of the phrase ANN model really means the definition
of a class of such functions (where members of the class are obtained by varying
parameters, connection weights, or specifics of the architecture such as the number
of neurons or their connectivity).
2.4.3 | Network Function
The word network in the term 'artificial neural network' refers to the inter–
connections between the neurons in the different layers of each system. An
example system has three layers. The first layer has input neurons, which send data
via synapses to the second layer of neurons, and then via more synapses to the
third layer of output neurons. More complex systems will have more layers of
neurons with some having increased layers of input neurons and output neurons.
The synapses store parameters called "weights" that manipulate the data in the
calculations. An ANN is typically defined by three types of parameters:
 The interconnection pattern between different layers of neurons
 The learning process for updating the weights of the interconnections
 The activation function that converts a neuron's weighted input to its output
activation.
Mathematically, a neuron's network function is defined as a composition of
other functions, which can further be defined as a composition of other functions.
This can be conveniently represented as a network structure, with arrows depicting
the dependencies between variables. A widely used type of composition is the
nonlinear weighted sum, where (commonly referred to as the activation function)
is some predefined function, such as the hyperbolic tangent. It will be convenient
for the following to refer to a collection of functions as simply a vector.
2.4.4 | ANN dependency graph
This figure depicts such a decomposition of , with dependencies between
variables indicated by arrows. These can be interpreted in two ways.
The first view is the functional view: the input is transformed into a 3dimensional vector , which is then transformed into a 2-dimensional vector , which
is finally transformed into . This view is most commonly encountered in the
context of optimization.

24
Chapter 2 | Speech Recognition

The second view is the probabilistic view: the random variable depends upon
the random variable , which depends upon , which depends upon the random
variable . This view is most commonly encountered in the context of graphical
models.
The two views are largely equivalent. In either case, for this particular
network architecture, the components of individual layers are independent of each
other (e.g., the components of are independent of each other given their input).
This naturally enables a degree of parallelism in the implementation. Two separate
depictions of the recurrent ANN dependency graph.
Networks such as the previous one are commonly called feed forward,
because their graph is a directed acyclic graph. Networks with cycles are
commonly called recurrent. Such networks are commonly depicted in the manner
shown at the top of the figure, where is shown as being dependent upon itself.
However, an implied temporal dependence is not shown.
2.4.5 | Learning
What has attracted the most interest in neural networks is the possibility of
learning. Given a specific task to solve, and a class of functions, learning means
using a set of observations to find which solves the task in some optimal sense.
This entails defining a cost function such that, for the optimal solution, - i.e.,
no solution has a cost less than the cost of the optimal solution (see Mathematical
optimization).
The cost function is an important concept in learning, as it is a measure of
how far away a particular solution is from an optimal solution to the problem to be
solved. Learning algorithms search through the solution space to find a function
that has the smallest possible cost.
For applications where the solution is dependent on some data, the cost must
necessarily be a function of the observations; otherwise we would not be modeling
anything related to the data. It is frequently defined as a statistic to which only
approximations can be made. As a simple example, consider the problem of
finding the model , which minimizes , for data pairs drawn from some distribution
. In practical situations we would only have samples from and thus, for the above
example, we would only minimize . Thus, the cost is minimized over a sample of
the data rather than the entire data set.

25
Chapter 2 | Speech Recognition

When some form of online machine learning must be used, where the cost is
partially minimized as each new example is seen. While online machine learning is
often used when is fixed, it is most useful in the case where the distribution
changes slowly over time. In neural network methods, some form of online
machine learning is frequently used for finite datasets.
2.4.6 | Choosing a cost function
While it is possible to define some arbitrary, ad hoc cost function, frequently a
particular cost will be used, either because it has desirable properties (such as
convexity) or because it arises naturally from a particular formulation of the
problem (e.g., in a probabilistic formulation the posterior probability of the model
can be used as an inverse cost). Ultimately, the cost function will depend on the
desired task. An overview of the three main categories of learning tasks is provided
below.
2.4.7 | Learning paradigms
There are three major learning paradigms, each corresponding to a particular
abstract learning task. These are supervised learning, unsupervised learning and
reinforcement learning.
2.4.8 | Supervised learning
In supervised learning, we are given a set of example pairs and the aim is to
find a function in the allowed class of functions that matches the examples. In
other words, we wish to infer the mapping implied by the data; the cost function is
related to the mismatch between our mapping and the data and it implicitly
contains prior knowledge about the problem domain.
A commonly used cost is the mean-squared error, which tries to minimize the
average squared error between the network's output, f(x), and the target value y
over all the example pairs. When one tries to minimize this cost using gradient
descent for the class of neural networks called multilayer perceptron’s, one obtains
the common and well-known back-propagation algorithm for training neural
networks.
Tasks that fall within the paradigm of supervised learning are pattern
recognition (also known as classification) and regression (also known as function
approximation). The supervised learning paradigm is also applicable to sequential

26
Chapter 2 | Speech Recognition

data (e.g., for speech and gesture recognition). This can be thought of as learning
with a "teacher," in the form of a function that provides continuous feedback on the
quality of solutions obtained thus far.
2.4.9 | Unsupervised learning
In unsupervised learning, some data is given and the cost function to be
minimized, that can be any function of the data and the network's output.
The cost function is dependent on the task (what we are trying to model) and
our a priori assumptions (the implicit properties of our model, its parameters and
the observed variables).
As a trivial example, consider the model, where is a constant and the cost.
Minimizing this cost will give us a value of that is equal to the mean of the data.
The cost function can be much more complicated. Its form depends on the
application: for example, in compression it could be related to the mutual
information between and, whereas in statistical modeling, it could be related to the
posterior probability of the model given the data. (Note that in both of those
examples those quantities would be maximized rather than minimized).
Tasks that fall within the paradigm of unsupervised learning are in general
estimation problems; the applications include clustering, the estimation of
statistical distributions, compression and filtering.
2.4.10 | Reinforcement learning
In reinforcement learning, data are usually not given, but generated by an
agent's interactions with the environment. At each point in time, the agent performs
an action and the environment generates an observation and an instantaneous cost,
according to some (usually unknown) dynamics. The aim is to discover a policy
for selecting actions that minimizes some measure of a long-term cost; i.e., the
expected cumulative cost. The environment's dynamics and the long-term cost for
each policy are usually unknown, but can be estimated.
More formally, the environment is modeled as a Markov decision process
(MDP) with states and actions with the following probability distributions: the
instantaneous cost distribution, the observation distribution and the transition,
while a policy is defined as conditional distribution over actions given the
observations. Taken together, the two define a Markov chain (MC). The aim is to

27
Chapter 2 | Speech Recognition

discover the policy that minimizes the cost; i.e., the MC for which the cost is
minimal.
ANNs are frequently used in reinforcement learning as part of the overall
algorithm. Dynamic programming has been coupled with ANNs (Neuro dynamic
programming) by Bertsekas and Tsitsiklis and applied to multi-dimensional
nonlinear problems such as those involved in vehicle routing or natural resources
management because of the ability of ANNs to mitigate losses of accuracy even
when reducing the discretization grid density for numerically approximating the
solution of the original control problems.
Tasks that fall within the paradigm of reinforcement learning are control
problems, games and other sequential decision making tasks.
2.4.11 | Learning algorithms
Training a neural network model essentially means selecting one model from
the set of allowed models (or, in a Bayesian framework, determining a distribution
over the set of allowed models) that minimizes the cost criterion. There are
numerous algorithms available for training neural network models; most of them
can be viewed as a straightforward application of optimization theory and
statistical estimation.
Most of the algorithms used in training artificial neural networks employ some
form of gradient descent. This is done by simply taking the derivative of the cost
function with respect to the network parameters and then changing those
parameters in a gradient-related direction.
Evolutionary methods, simulated annealing, expectation-maximization, nonparametric methods and particle swarm optimization are some commonly used
methods for training neural networks.
2.4.12 | Employing artificial neural networks
Perhaps the greatest advantage of ANNs is their ability to be used as an
arbitrary function approximation mechanism that 'learns' from observed data.
However, using them is not so straightforward and a relatively good understanding
of the underlying theory is essential.
Choice of model: This will depend on the data representation and the
application. Overly complex models tend to lead to problems with learning.

28
Chapter 2 | Speech Recognition

Learning algorithm: There is numerous trades-offs between learning
algorithms. Almost any algorithm will work well with the correct hyper parameters
for training on a particular fixed data set. However selecting and tuning an
algorithm for training on unseen data requires a significant amount of
experimentation.
Robustness: If the model, cost function and learning algorithm are selected
appropriately the resulting ANN can be extremely robust.
With the correct implementation, ANNs can be used naturally in online
learning and large data set applications. Their simple implementation and the
existence of mostly local dependencies exhibited in the structure allows for fast,
parallel implementations in hardware.
2.4.13 | Applications
The utility of artificial neural network models lies in the fact that they can be
used to infer a function from observations. This is particularly useful in
applications where the complexity of the data or task makes the design of such a
function by hand impractical.
2.4.13.1 | Real-life applications






The tasks artificial neural networks are applied to tend to fall within the
following broad categories:
Function approximation, or regression analysis, including time series prediction,
fitness approximation and modeling.
Classification, including pattern and sequence recognition, novelty detection and
sequential decision making.
Data processing, including filtering, clustering, blind source separation and
compression.
Robotics, including directing manipulators, Computer numerical control.
Application areas include system identification and control (vehicle control,
process control, natural resources management), quantum chemistry, game-playing
and decision making (backgammon, chess, poker), pattern recognition (radar
systems, face identification, object recognition and more), sequence recognition
(gesture, speech, handwritten text recognition), medical diagnosis, financial

29
Chapter 2 | Speech Recognition

applications (automated trading systems), data mining (or knowledge discovery in
databases, "KDD"), visualization and e-mail spam filtering.
Artificial neural networks have also been used to diagnose several cancers.
An ANN based hybrid lung cancer detection system named HLND improves the
accuracy of diagnosis and the speed of lung cancer radiology. These networks have
also been used to diagnose prostate cancer. The diagnoses can be used to make
specific models taken from a large group of patients compared to information of
one given patient.
The models do not depend on assumptions about correlations of different
variables. Colorectal cancer has also been predicted using the neural networks.
Neural networks could predict the outcome for a patient with colorectal cancer
with a lot more accuracy than the current clinical methods. After training, the
networks could predict multiple patient outcomes from unrelated institutions.
2.4.13.2 | Neural networks and neuroscience
Theoretical and computational neuroscience is the field concerned with the
theoretical analysis and computational modeling of biological neural systems.
Since neural systems are intimately related to cognitive processes and behavior, the
field is closely related to cognitive and behavioral modeling.
The aim of the field is to create models of biological neural systems in order
to understand how biological systems work. To gain this understanding,
neuroscientists strive to make a link between observed biological processes (data),
biologically plausible mechanisms for neural processing and learning (biological
neural network models) and theory (statistical learning theory and information
theory).
2.4.14 | Types of models
Many models are used in the field defined at different levels of abstraction
and modeling different aspects of neural systems. They range from models of the
short-term behavior of individual neurons, models of how the dynamics of neural
circuitry arise from interactions between individual neurons and finally to models
of how behavior can arise from abstract neural modules that represent complete
subsystems. These include models of the long-term, and short-term plasticity, of
neural systems and their relations to learning and memory from the individual
neuron to the system level.

30
Chapter 2 | Speech Recognition

2.4.15 | Neural network software
Neural network software is used to simulate research, develop and apply
artificial neural networks, biological neural networks and in some cases a wider
array of adaptive systems.
2.4.16 | Types of artificial neural networks
Artificial neural network types vary from those with only one or two layers of
single direction logic, to complicated multi–input many directional feedback loop
and layers. On the whole, these systems use algorithms in their programming to
determine control and organization of their functions. Some may be as simple as a
one neuron layer with an input and an output, and others can mimic complex
systems such as dANN, which can mimic chromosomal DNA through sizes at
cellular level, into artificial organisms and simulate reproduction, mutation and
population sizes.
Most systems use "weights" to change the parameters of the throughput and
the varying connections to the neurons. Artificial neural networks can be
autonomous and learn by input from outside "teachers" or even self-teaching from
written in rules.
2.4.17 | Confidence analysis of a neural network
Supervised neural networks that use an MSE cost function can use formal
statistical methods to determine the confidence of the trained model. The MSE on
a validation set can be used as an estimate for variance. This value can then be
used to calculate the confidence interval of the output of the network, assuming a
normal distribution. A confidence analysis made this way is statistically valid as
long as the output probability distribution stays the same and the network is not
modified.
By assigning a softmax activation function on the output layer of the neural
network (or a softmax component in a component-based neural network) for
categorical target variables, the outputs can be interpreted as posterior
probabilities. This is very useful in classification as it gives a certainty measure on
classifications.

31
CHAPTER 3
Image Processing

s
Chapter 3 | Image Processing

3.1 | INTRODUCTION
This chapter is an introduction on how to handle images in Matlab. When
working with images in Matlab, there are many things to keep in mind such as
loading an image, using the right format, saving the data as different data types,
how to display an image, conversion between different image formats, etc. This
worksheet presents some of the commands designed for these operations. Most of
these commands require you to have the Image processing tool box installed with
MATLAB. To find out if it is installed type very at the Matlab prompt. This gives
you a list of what tool boxes that are installed on your system.
For further reference on image handling in Matlab you are recommended to
use Matlab's help browser. There is an extensive (and quite good) on-line manual
for the Image processing tool box that you can access via Matlab's help browser.
The first sections of this worksheet are quite heavy. The only way to
understand how the presented commands work, is to carefully work through the
examples given at the end of the worksheet. Once you can get these examples to
work, experiment on your own using your favorite image!
3.1.1 | What Is Digital Image Processing?
Transforming digital information representing images.
3.1.2 | Motivating Problems:
1.
2.
3.
4.
5.
6.
7.
8.
9.

Improve pictorial information for human interpretation.
Remove noise.
Correct for motion, camera position, and distortion.
Enhance by changing contrast, color.
Segmentation - dividing an image up into constituent parts
Representation - representing an image by some more abstract.
Models Classification.
Reduce the size of image information for efficient handling.
Compression with loss of digital information that minimizes loss of "perceptual"
information. JPEG and GIF, MPEG.

33
Chapter 3 | Image Processing

3.2 | COLOR VISION
The color-responsive chemicals in the cones are called cone pigments and are
very similar to the chemicals in the rods. The retinal portion of the chemical is the
same, however the scotopsin is replaced with photopsins. Therefore, the colorresponsive pigments are made of retinal and photopsins. There are three kinds of
color-sensitive pigments:
• Red-sensitive pigment
• Green-sensitive pigment
• Blue-sensitive pigmentlution representations versus quality of service.
Each cone cell has one of these pigments so that it is sensitive to that color.
The human eye can sense almost any gradation of color when red, green and blue
are mixed.
The wavelengths of the three types of cones (red, green and blue) are shown.
The peak absorbancy of blue-sensitive pigment is 445 nanometers, for greensensitive pigment it is 535 nanometers, and for red-sensitive pigment it is 570
nanometers.
MATLAB stores most images as two-dimensional arrays (i.e., matrices), in
which each element of the matrix corresponds to a single pixel in the displayed
image. For example, an image composed of 200 rows and 300 columns of different
colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images,
such as RGB, require a three dimensional array, where the first plane in the 3rd
dimension represents the red pixel intensities, the second plane represents the
green pixel intensities, and the third plane represents the blue pixel intensities.
To reduce memory requirements, MATLAB supports storing image data in
arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit
unsigned integers. These arrays require one-eighth or one-fourth as much memory
as data in double arrays. An image whose data matrix has class uint8 is called an 8bit image; an image whose data matrix has class uint16 is called a 16-bit image.
3.2.1 | Fundamentals
A digital image is composed of pixels which can be thought of as small dots
on the screen. A digital image is an instruction of how to color each pixel. We will
see in detail later on how this is done in practice. A typical size of an image is 512by-512 pixels. Later on in the course you will see that it is convenient to let the

33
Chapter 3 | Image Processing

dimensions of the image to be a power of 2. For example, 2 9=512. In the general
case we say that an image is of size m-by-n if it is composed of m pixels in the
vertical direction and n pixels in the horizontal direction.
Let us say that we have an image on the format 512-by-1024 pixels. This
means that the data for the image must contain information about 524288 pixels,
which requires a lot of memory! Hence, compressing images is essential for
efficient image processing. You will later on see how Fourier analysis and Wavelet
analysis can help us to compress an image significantly. There are also a few
"computer scientific" tricks (for example entropy coding) to reduce the amount of
data required to store an image.
3.2.2 | Image Formats Supported By Mat lab.
The following image formats are supported by Mat lab:







BMP
HDF
JPEG
PCX
TIFF
XWB
Most images you find on the Internet are JPEG-images which is the name for
one of the most widely used compression standards for images. If you have
stored an image you can usually see from the suffix what format it is stored in. For
example, an image named myimage.jpg is stored in the JPEG format and we will
see later on that we can load an image of this format into Mat lab.
3.2.3 | Working Formats In Matlab:
If an image is stored as a JPEG-image on your disc we first read it into
Matlab. However, in order to start working with an image, for example perform a
wavelet transform on the image, we must convert it into a different format. This
section explains four common formats.
3.3 | ASPECTS OF IMAGE PROCESSING

33
Chapter 3 | Image Processing

Image Enhancement: Processing an image so that the result is more suitable for a
particular application. (Sharpening or deploring an out of focus image, highlighting
edges, improving image contrast, or brightening an image, removing noise)
Image Restoration: This may be considered as reversing the damage done to an
image by a known cause. (Removing of blur caused by linear motion, removal of
optical distortions)
Image Segmentation: This involves subdividing an image into constituent parts,
or isolating certain aspects of an image.(finding lines, circles, or particular shapes
in an image, in an aerial photograph, identifying cars, trees, buildings, or roads.
3.4 | IMAGE TYPES
3.4.1 | Intensity Image (Gray Scale Image)
This is the equivalent to a "gray scale image" and this is the image we will
mostly work with in this course. It represents an image as a matrix where every
element has a value corresponding to how bright/dark the pixel at the
corresponding position should be colored. There are two ways to represent the
number that represents the brightness of the pixel: The double class (or data type).
This assigns a floating number ("a number with decimals") between 0 and 1 to
each pixel. The value 0 corresponds to black and the value 1 corresponds to white.
The other class is called uint8 which assigns an integer between 0 and 255 to
represent the brightness of a pixel. The value 0 corresponds to black and 255 to
white. The class uint8 only requires roughly 1/8 of the storage compared to the
class double. On the other hand, many mathematical functions can only be applied
to the double class. We will see later how to convert between double and uint8.

Fig. (3.1)

33
Chapter 3 | Image Processing

3.4.2 | Binary Image:
This image format also stores an image as a matrix but can only color a pixel
black or white (and nothing in between). It assigns a 0 for black and a 1 for white.
3.4.3 | Indexed Image:
This is a practical way of representing color images. (In this course we will
mostly work with gray scale images but once you have learned how to work with a
gray scale image you will also know the principle how to work with color images.)
An Indexed image stores an image as two matrices. The first matrix has the same
size as the image and one number for each pixel. The second matrix is called the
color map and its size may be different from the image. The numbers in the first
matrix is an instruction of what number to use in the color map matrix.

Fig. (3.2)

3.4.4 | RGB Image
This is another format for color images. It represents an image with three
matrices of sizes matching the image format. Each matrix corresponds to one of
the colors red, green or blue and gives an instruction of how much of each of these
colors a certain pixel should use.
3.4.5 | Multi-frame Image:
In some applications we want to study a sequence of images. This is very
common in biological and medical imaging where you might study a sequence of
slices of a cell. For these cases, the multi-frame format is a convenient way of

33
Chapter 3 | Image Processing

working with a sequence of images. In case you choose to work with biological
imaging later on in this course, you may use this format.

3.5 | HOW TO?
3.5.1 | How To Convert Between Different Formats:
The following table shows how to convert between the different formats given
above. All these commands require the Image processing tool box!
Table(3.1)Image format conversion (Within the parenthesis you type
the name of the image you wish to convert)
Operation
Convert between intensity/indexed/RGB format to binary format.
Convert between intensity format to indexed format.
Convert between indexed format to intensity format.
Convert between indexed format to RGB format.
Convert a regular matrix to intensity format by scaling.
Convert between RGB format to intensity format.
Convert between RGB format to indexed format.

Matlab
command
dither()
gray2ind()
ind2gray()
ind2rgb()
mat2gray()
rgb2gray()
rgb2ind()

The command mat2gray is useful if you have a matrix representing an image
but the values representing the gray scale range between, let's say, 0 and 1000. The
command mat2gray automatically re scales all entries so that they fall within 0 and
255 (if you use the uint class) or 0 and 1 (if you use the double class).
3.5.2 | How to Read Files
When you encounter an image you want to work with, it is usually in form
of a file (for example, if you down load an image from the web, it is usually stored
as a JPEG-file). Once we are done processing an image, we may want to write it
back to a JPEG-file so that we can, for example, post the processed image on the
web. This is done using the imread and imwrite commands. These commands
require the Image processing tool box!

33
Chapter 3 | Image Processing

Table(3.2)Reading and writing image files
Operation
Read an image.
(Within the parenthesis you type the name of the image file you
wish to read. Put the file name within single quotes
Write an image to a file.
(As the first argument within the parenthesis you type the name
of the image you have worked with. As a second argument
within the parenthesis you type the name of the file and format
that you want to write the image to. Put the file name within
single quotes.

Matlab command
imread()

imwrite( )

Make sure to use semi-colon; after these commands, otherwise you will get
LOTS OF number scrolling on your screen... The commands imread and imwrite
support the formats given in the section "Image formats supported by Matlab"
above.
3.5.3 | Loading And Saving Variables in Matlab
This section explains how to load and save variables in Mat lab. Once you
have read a file, you probably convert it into an intensity image (a matrix) and
work with this matrix. Once you are done you may want to save the matrix
representing the image in order to continue to work with this matrix at another
time. This is easily done using the commands save and load. Note that save and
load are commonly used Matlab commands, and works independently of what tool
boxes that are installed.
Table(3.3) Loading and saving variables
Operation
Save the variable X.
Load the variable X.

Matlab command
Save X
Load X

3.5.4 | How to Display an Image in MATLAB
Here are a couple of basic Mat lab commands (do not require any tool box)
for displaying an image.

33
Chapter 3 | Image Processing

Table(3.4)Displaying an image given on matrix form
Operation
Display an image represented as the matrix X.
Adjust the brightness .S is a parameter such that -1<s<0 gives a
darker image, 0<s<1 gives a brighter image.
Change the colors to gray.

Matlab command
imagesc(X)
brighten(s)
colormap(gray)

Sometimes your image may not be displayed in gray scale even though you
might have converted it into a gray scale image. You can then use the command
colormap (gray) to "force" Matlab to use a gray scale when displaying an image.
If you are using Matlab with an Image processing tool box installed, I
recommend you to use the command imshow to display an image.
Table (3.5)Displaying an image given on matrix form (with image processing tool box)
Operation
Matlab command
Display an image represented as the matrix X.
imshow(X)
Zoom in (using the left and right mouse button).
zoom on
Turn off the zoom function.
zoom off

3.6 | SOME IMPORTANT DEFINITIONS
3.6.1 | Imread Function
A = imread (filename, fmt) reads a grayscale or true color image named filename
into A. If the file contains a grayscale intensity image, A is a two-dimensional
array. If the file contains a true color (RGB) image, A is a three-dimensional (mby-n-by-3) array.
3.6.2 | Rotation
>> B = imrotate (A, ANGLE, METHOD)

Where;
A: Your image.
ANGLE: The angle (in degrees) you want to rotate your image in the counter
clockwise direction.
METHOD: A string that can have one of these values
If you omit the METHOD argument, IMROTATE uses the default method of
'nearest'.

34
Chapter 3 | Image Processing

Note: to rotate the image clockwise, specify a negative angle. The returned image
matrix B is, in general, larger than A to include the whole rotated image.
IMROTATE sets invalid values on the periphery of B to 0.
3.6.3 | Scaling
IMRESIZE resizes an image of any type using the specified interpolation
method. Supported interpolation methods
3.6.4 | Interpolation
'nearest' (default) nearest neighbor interpolation?
'bilinear' bilinear interpolation?
'bicubic' bicubic interpolation ?
B = IMRESIZE(A,M,METHOD) returns an image that is M times the size of A. If
M is between 0 and 1.0, B is smaller than A. If M is greater than 1.0, B is larger
than A. If METHOD is omitted, IMRESIZE uses nearest neighbor interpolation.
B = IMRESIZE (A,[MROWS MCOLS],METHOD) returns an image of size
MROWS-by-MCOLS. If the specified size does not produce the same aspect ratio
as the input image has, the output image is distorted.
a= imread(‘image.fmt’); % put your image in place of image.fmt.
» B = IMRESIZE (a,[100 100],'nearest');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bilinear');
» imshow(B);
» B = IMRESIZE(a,[100 100],'bicubic');
» imshow(B);

3.7 | EDGE DETECTION
3.7.1 | Canny Edge Detector
1. Low error rate of detection
Well match human perception results
2. Good localization of edges
The distance between actual edges in an image and the edges found by a
computational algorithm should be minimized
3. Single response

34
Chapter 3 | Image Processing

The algorithm should not return multiple edges pixels when only a single one exist.
3.7.2 | Edge Detectors

bw

color

Canny

sobel

Fig.(3.4)

Fig. (3.5)

3.7.3 | Edge Tracing
b=rgb2gray(a); % convert to gray. WE can only do edge tracing for gray images.
edge(b,'prewitt');
edge(b,'sobel');
edge(b,'sobel','vertical');
edge(b,'sobel','horizontal');
edge(b,'sobel','both');

We can only do edge tracing using gray scale images (i.e images without color).

34
Chapter 3 | Image Processing

>> BW=rgb2gray (A);
>> edge (BW,’prewitt’)

Fig.(3.6)

That is what I saw!
>> edge (BW,’sobel’,’vertical’)
>> edge (BW,’sobel’,’horizontal’)
>> edge (BW,’sobel’,’both’)
Table(3.6):Data types
Type
Int8
Uint8.
Int16
Double

Description
8-bit integer
8-bit unsigned integer
16-bit integer
Double precision real number

3.8 | MAPPING
3.8.1 | Mapping Images onto Surfaces Overview

33

Range
-128_127
0_255
-32768_32767
Machine specific
Chapter 3 | Image Processing

Mapping an image onto geometry, also known as texture mapping, involves
overlaying an image or function onto a geometric surface. Images may be realistic,
such as satellite images, or representational, such as color-coded functions of
temperature or elevation. Unlike volume visualizations, which render each voxel
(volume element) of a three-dimensional scene, mapping an image onto geometry
efficiently creates the appearance of complexity by simply layering an image onto
a surface. The resulting realism of the display also provides information that is not
as readily apparent as with a simple display of either the image or the geometric
surface.
Mapping an image onto a geometric surface is a two step process. First, the
image is mapped onto the geometric surface in object space. Second, the surface
undergoes view transformations (relating to the viewpoint of the observer) and is
then displayed in 2D screen space. You can use IDL Direct Graphics or Object
Graphics to display images mapped onto geometric surfaces. The following table
introduces the tasks and routines.
Table(3.7):Tasks and Routines Associated with Mapping an Image onto Geometry
Routine(s)/Object(s)
Description
SHADE_SURF
Display the elevation data
IDLgrWindow::Init
IDLgrView::Init
Initialize the objects necessary for an Object Graphics display.
IDLgrModel::Init
IDLgrSurface:: Init
Initialize a surface object containing the elevation data.
IDLgrImage::Init
Initialize an image object containing the satellite image
XOBJVIEW
Display the object in an interactive IDL utility allowing
rotation and resizing.

3.8.2 | Mapping an Image onto Elevation Data
The following Object Graphics example maps a satellite image from the Los
Angeles, California vicinity onto a DEM (Digital Elevation Model) containing the
areas topographical features. The realism resulting from mapping the image onto
the corresponding elevation data provides a more informative view of the area’s
topography. The process is segmented into the following three sections:
• “Opening Image and Geometry Files”
• “Initializing the IDL Display Objects”
• “Displaying the Image and Geometric Surface Objects”

33
Chapter 3 | Image Processing

Note:
Data can be either regularly gridded (defined by a 2D array) or irregularly
gridded (defined by irregular x, y, z points). Both the image and elevation data used
in this example are regularly gridded. If you are dealing with irregularly gridded
data, use GRIDDATA to map the data to a regular grid.
Complete the following steps for a detailed description of the process.
Example Code:
See elevation_object.pro in the examples/doc/image subdirectory of the IDL
installation directory for code that duplicates this example. Run the example
procedure by entering elevation object at the IDL command prompt or view the file
in an IDL Editor window by entering .EDIT elevation_object.pro.
Opening Image and Geometry Files:
The following steps read in the satellite image and DEM files and display the
Elevation data.
1. Select the satellite image:
>> imageFile = FILEPATH('elev_t.jpg', $)
SUBDIRECTORY = ['examples', 'data'])

2. Import the JPEG file:
READ_JPEG, image File, image
3. Select the DEM file:
demFile = FILEPATH('elevbin.dat', $)
SUBDIRECTORY = ['examples', 'data'])

4. Define an array for the elevation data, open the file, read in the data and close
the file:
dem = READ_BINARY(demfile, DATA_DIMS = [64, 64]

5. Enlarge the size of the elevation array for display purposes:
dem = CONGRID(dem, 128, 128, /INTERP)

6. To quickly visualize the elevation data before continuing on to the Object
Graphics section, initialize the display, create a window and display the elevation
data using the SHADE_SURF command:
DEVICE, DECOMPOSED = 0

33
Chapter 3 | Image Processing

WINDOW, 0, TITLE = 'Elevation Data'
SHADE_SURF, dem
After reading in the satellite image and DEM data, continue with the next section
to create the objects necessary to map the satellite image onto the elevation
surface.

Fig.(3.7):Visual Display of the Elevation Data

After reading in the satellite image and DEM data, continue with the next
section to create the objects necessary to map the satellite image onto the elevation
surface.
3.8.3 | Initializing the IDL Display Objects
After reading in the image and surface data in the previous steps, you will
need to create objects containing the data. When creating an IDL Object Graphics
display, it is necessary to create a window object (oWindow), a view object
(oView) and a model object (oModel). These display objects, shown in the
conceptual representation in the following figure, will contain a geometric surface
object (the DEM data) and an image object (the satellite image).
These user-defined objects are instances of existing IDL object classes and
provide access to the properties and methods associated with each object class.

33
Chapter 3 | Image Processing

Note:
(The XOBJVIEW utility (described in “Mapping an Image Object onto a
Sphere” automatically creates window and view Complete the following steps to
initialize the necessary IDL objects.)
1. Initialize the window, view and model display objects. For detailed syntax,
arguments and keywords available with each object initialization, see
IDLgrWindow::Init, IDLgrView::Init and IDLgrModel::Init. The following
three lines use the basic syntax :
oNewObject = OBJ_NEW('Class_Name')

To create these objects:
oWindow = OBJ_NEW('IDLgrWindow', RETAIN = 2, COLOR_MODEL = 0)
oView = OBJ_NEW('IDLgrView')
oModel = OBJ_NEW('IDLgrModel')

2. Assign the elevation surface data, dem, to an IDLgrSurface object. The
IDLgrSurface::Init keyword, STYLE = 2, draws the elevation data using a filled
line style:
oSurface = OBJ_NEW('IDLgrSurface', dem, STYLE = 2)

3. Assign the satellite image to a user-defined IDLgrImage object using
IDLgrImage::Init:
oImage = OBJ_NEW('IDLgrImage', image, INTERLEAVE = 0, $
/INTERPOLATE)
INTERLEAVE = 0 indicates that the satellite image is organized using pixel
interleaving, and therefore has the dimensions (3, m, n). The INTERPOLATE
keyword forces bilinear interpolation instead of using the default nearest neighbor
interpolation method.
3.8.4 | Displaying the Image and Geometric Surface Objects
This section displays the objects created in the previous steps. The image and
surface objects will first be displayed in an IDL Object Graphics window and then
with the interactive XOBJVIEW utility.

33
Chapter 3 | Image Processing

1. Center the elevation surface object in the display window. The default object
graphics coordinate system is [–1,–1], [1,1]. To center the object in the window,
position the lower left corner of the surface data at [–0.5,–0.5, –0.5]
for the x, y and z dimensions:
2. Map the satellite image onto the geometric elevation surface using the
IDLgrSurface::Init TEXTURE_MAP keyword:
oSurface -> SetProperty, TEXTURE_MAP = oImage, $
COLOR = [255, 255, 255]
For clearest display of the texture map, set COLOR = [255, 255, 255]. If the image
does not have dimensions that are exact powers of 2, IDL resamples the image into
a larger size that has dimensions which are the next powers of two greater than the
original dimensions. This resampling may cause unwanted sampling artifacts. In
this example, the image does have dimensions that are exact powers of two, so no
resampling occurs.
oSurface -> GETPROPERTY, XRANGE = xr, YRANGE = yr, $
ZRANGE = zr
xs = NORM_COORD(xr)
xs[0] = xs[0] - 0.5
ys = NORM_COORD(yr)
ys[0] = ys[0] - 0.5
zs = NORM_COORD(zr)
zs[0] = zs[0] - 0.5
oSurface -> SETPROPERTY, XCOORD_CONV = xs, $
YCOORD_CONV = ys, ZCOORD = zs

Note:
(If your texture does not have dimensions that are exact powers of 2 and you
do not want to introduce resampling artifacts, you can pad the texture with unused
data to a power of two and tell IDL to map only a subset of the texture onto the
surface.) For example, if your image is 40 by 40, create a 64 by 64 image and fill
part of it with the image data:
textureImage = BYTARR(64, 64, /NOZERO)
textureImage[0:39, 0:39] = image ; image is 40 by 40
oImage = OBJ_NEW('IDLgrImage', textureImage)
Then, construct texture coordinates that map the active part of the texture to a
surface (oSurface):
textureCoords = [[], [], [], []]

33
Chapter 3 | Image Processing

oSurface -> SetProperty, TEXTURE_COORD = textureCoords
The surface object in IDL 5.6 is has been enhanced to automatically perform
the above calculation. In the above example, just use the image data (the 40 by 40
array) to create the image texture and do not supply texture coordinates. IDL
computes the appropriate texture coordinates to correctly use the 40 by 40 image.
Note:
(Some graphic devices have a limit for the maximum texture size. If your
texture is larger than the maximum size, IDL scales it down into dimensions that
work on the device. This rescaling may introduce resampling artifacts and loss of
detail in the texture. To avoid this, use the TEXTURE_HIGHRES keyword to tell
IDL to draw the surface in smaller pieces that can be texture mapped without loss
of detail.)
3. Add the surface object, covered by the satellite image, to the model object.
Then add the model to the view object:
oModel -> Add, oSurface.
oView -> Add, oMode.
4. Rotate the model for better display in the object window. Without rotating the
model, the surface is displayed at a 90 elevation angle, containing no depth
information. The following lines rotate the model 90 away from the viewer along
the x-axis and 30clockwise along the y-axis and the x-axis:
oModel -> ROTATE, [1, 0, 0], -90
oModel -> ROTATE, [0, 1, 0], 30
oModel -> ROTATE, [1, 0, 0], 30

5. Display the result in the Object Graphics window:
oWindow -> Draw, oView

Fig.(3.9:Image Mapped onto a Surface in an Object Graphics Window

33
Chapter 3 | Image Processing

6. Display the results using XOBJVIEW, setting the SCALE = 1 (instead of the
default value of 1/SQRT3) to increase the size of the initial display:
XOBJVIEW, oModel, /BLOCK, SCALE = 1
This results in the following display:

Fig.( 3.10) Displaying the Image Mapped onto the Surface in XOBJVIEW

After displaying the model, you can rotate it by clicking in the
applicationwindow and dragging your mouse. Select the magnify button, then click
near the middle of the image. Drag your mouse away from the center of the display
to magnify the image or toward the center of the display to shrink the image. Select
the left-most button on the XOBJVIEW toolbar to reset the display.
7. Destroy unneeded object references after closing the display windows:
OBJ_DESTROY, [oView, oImage]
The oModel and oSurface objects are automatically destroyed when oView is
destroyed.
For an example of mapping an image onto a regular surface using both Direct
and Object Graphics displays, see “Mapping an Image onto a Sphere”

34
Chapter 3 | Image Processing

3.8.5 | Mapping an Image onto a Sphere
The following example maps an image containing a color representation of
world elevation onto a sphere using both Direct and Object Graphics displays. The
example
is broken down into two sections:
• “Mapping an Image onto a Sphere Using Direct Graphics” .
• “Mapping an Image Object onto a Sphere” .
3.9 | MAPPING OFF LINE:
In the absence of a network or services we can identify and see the track
through the use of image processing technique, We incorporate the map where an
image of the places familiar to the person and determine how to access them and
return them in a clear and safe.
we calculate the distances by using mat lab function :
IMDISTLINE
and assuming speed to calculate time takes to get from one point to another and
we guide person through voice commands for example on the road to move
forward or back word or to left or to right. We have thus, we provide another way
to work mapping without being online.

34
CHAPTER 4
GPS Navigation
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book
Smart blind stick book

Contenu connexe

Tendances

Vehicle accident detection system (VAD)
Vehicle accident detection system (VAD)Vehicle accident detection system (VAD)
Vehicle accident detection system (VAD)Study Hub
 
Smart Blind Stick for Blind Peoples
Smart Blind Stick for Blind PeoplesSmart Blind Stick for Blind Peoples
Smart Blind Stick for Blind PeoplesDharmaraj Morle
 
ACCIDENT DETECTION SYSTEM PPT
ACCIDENT DETECTION SYSTEM PPTACCIDENT DETECTION SYSTEM PPT
ACCIDENT DETECTION SYSTEM PPTJoshnasai
 
ACCIDENT DETECTION USING MOBILE PHONE.pptx
ACCIDENT DETECTION USING MOBILE  PHONE.pptxACCIDENT DETECTION USING MOBILE  PHONE.pptx
ACCIDENT DETECTION USING MOBILE PHONE.pptxAjay575757
 
ppt on accident detection system based on Iot
ppt on accident detection system based on Iotppt on accident detection system based on Iot
ppt on accident detection system based on Iotrahul ranjan
 
Vehicle accident detection and messaging system using GSM and arduino
Vehicle accident detection and messaging system using GSM and arduinoVehicle accident detection and messaging system using GSM and arduino
Vehicle accident detection and messaging system using GSM and arduinoRamesh Reddy
 
Project plannning of smart blind stick
Project plannning of smart blind stickProject plannning of smart blind stick
Project plannning of smart blind stickSamir Ahmed Shimul
 
IoT Based Garbage Monitoring System ppt
IoT Based Garbage Monitoring System pptIoT Based Garbage Monitoring System ppt
IoT Based Garbage Monitoring System pptRanjan Gupta
 
Iot based health monitoring system
Iot based health monitoring systemIot based health monitoring system
Iot based health monitoring systemShaswataMohanta
 
Women safety device with gps tracking and alerts
Women safety device with gps tracking  and alertsWomen safety device with gps tracking  and alerts
Women safety device with gps tracking and alertsPraween Lakra
 
Women Security Assistance system with GPS tracking and messaging system
Women Security Assistance system with GPS tracking and messaging system Women Security Assistance system with GPS tracking and messaging system
Women Security Assistance system with GPS tracking and messaging system Uttej Kumar Palavai
 
sensor assisted smart white cane for blind man using Arduino UNO
sensor assisted smart white cane for blind man using Arduino UNOsensor assisted smart white cane for blind man using Arduino UNO
sensor assisted smart white cane for blind man using Arduino UNOAk Azad
 
OTP REVIEW (1).pptx
OTP REVIEW (1).pptxOTP REVIEW (1).pptx
OTP REVIEW (1).pptxRakeshAnna2
 
Shiv smart door ppt
Shiv smart door pptShiv smart door ppt
Shiv smart door pptMahesh Patil
 
IRJET- Child Safety Wearable Device using Wireless Technology
IRJET- Child Safety Wearable Device using Wireless TechnologyIRJET- Child Safety Wearable Device using Wireless Technology
IRJET- Child Safety Wearable Device using Wireless TechnologyIRJET Journal
 

Tendances (20)

Vehicle accident detection system (VAD)
Vehicle accident detection system (VAD)Vehicle accident detection system (VAD)
Vehicle accident detection system (VAD)
 
Eye directive wheel chair
Eye directive wheel chairEye directive wheel chair
Eye directive wheel chair
 
Smart Blind Stick for Blind Peoples
Smart Blind Stick for Blind PeoplesSmart Blind Stick for Blind Peoples
Smart Blind Stick for Blind Peoples
 
ACCIDENT DETECTION SYSTEM PPT
ACCIDENT DETECTION SYSTEM PPTACCIDENT DETECTION SYSTEM PPT
ACCIDENT DETECTION SYSTEM PPT
 
Borewell Rescue system
Borewell Rescue systemBorewell Rescue system
Borewell Rescue system
 
ACCIDENT DETECTION USING MOBILE PHONE.pptx
ACCIDENT DETECTION USING MOBILE  PHONE.pptxACCIDENT DETECTION USING MOBILE  PHONE.pptx
ACCIDENT DETECTION USING MOBILE PHONE.pptx
 
ppt on accident detection system based on Iot
ppt on accident detection system based on Iotppt on accident detection system based on Iot
ppt on accident detection system based on Iot
 
Smart street light system
Smart street light systemSmart street light system
Smart street light system
 
Vehicle accident detection and messaging system using GSM and arduino
Vehicle accident detection and messaging system using GSM and arduinoVehicle accident detection and messaging system using GSM and arduino
Vehicle accident detection and messaging system using GSM and arduino
 
Project plannning of smart blind stick
Project plannning of smart blind stickProject plannning of smart blind stick
Project plannning of smart blind stick
 
IoT Based Garbage Monitoring System ppt
IoT Based Garbage Monitoring System pptIoT Based Garbage Monitoring System ppt
IoT Based Garbage Monitoring System ppt
 
Iot based health monitoring system
Iot based health monitoring systemIot based health monitoring system
Iot based health monitoring system
 
OBSTACLE AVOIDING CAR
OBSTACLE AVOIDING CAROBSTACLE AVOIDING CAR
OBSTACLE AVOIDING CAR
 
Women safety device with gps tracking and alerts
Women safety device with gps tracking  and alertsWomen safety device with gps tracking  and alerts
Women safety device with gps tracking and alerts
 
SMART BLIND STICK
SMART BLIND STICK SMART BLIND STICK
SMART BLIND STICK
 
Women Security Assistance system with GPS tracking and messaging system
Women Security Assistance system with GPS tracking and messaging system Women Security Assistance system with GPS tracking and messaging system
Women Security Assistance system with GPS tracking and messaging system
 
sensor assisted smart white cane for blind man using Arduino UNO
sensor assisted smart white cane for blind man using Arduino UNOsensor assisted smart white cane for blind man using Arduino UNO
sensor assisted smart white cane for blind man using Arduino UNO
 
OTP REVIEW (1).pptx
OTP REVIEW (1).pptxOTP REVIEW (1).pptx
OTP REVIEW (1).pptx
 
Shiv smart door ppt
Shiv smart door pptShiv smart door ppt
Shiv smart door ppt
 
IRJET- Child Safety Wearable Device using Wireless Technology
IRJET- Child Safety Wearable Device using Wireless TechnologyIRJET- Child Safety Wearable Device using Wireless Technology
IRJET- Child Safety Wearable Device using Wireless Technology
 

En vedette

Sensor Based Blind Stick
Sensor Based Blind StickSensor Based Blind Stick
Sensor Based Blind StickGagandeep Singh
 
Electronic Walking Stick
Electronic Walking StickElectronic Walking Stick
Electronic Walking StickManav Mittal
 
Latest ECE Projects Ideas In Various Electronics Technologies
Latest ECE Projects Ideas In Various Electronics TechnologiesLatest ECE Projects Ideas In Various Electronics Technologies
Latest ECE Projects Ideas In Various Electronics Technologieselprocus
 
Maths Final Project Brief
Maths Final Project BriefMaths Final Project Brief
Maths Final Project BriefTamZhaoWei
 
Low incidence disabilities
Low incidence disabilitiesLow incidence disabilities
Low incidence disabilitiesSue Anderson
 
Pre Braille Skills And Fine Motor Development
Pre Braille Skills And Fine Motor DevelopmentPre Braille Skills And Fine Motor Development
Pre Braille Skills And Fine Motor Developmentwcbvi
 
Understanding Visually Impaired Students
Understanding Visually Impaired StudentsUnderstanding Visually Impaired Students
Understanding Visually Impaired Studentsjhhester
 
Ruchir Mishra Resume.
Ruchir Mishra  Resume.Ruchir Mishra  Resume.
Ruchir Mishra Resume.Ruchir Mishra
 
ORIENTATION AND MOBILITY FOR BLIND PERSON
ORIENTATION AND MOBILITY FOR BLIND PERSONORIENTATION AND MOBILITY FOR BLIND PERSON
ORIENTATION AND MOBILITY FOR BLIND PERSONRODER OGBAC
 
Guidelines for working with student who are blind or visually impaired
Guidelines for working with student who are blind or visually impairedGuidelines for working with student who are blind or visually impaired
Guidelines for working with student who are blind or visually impairedIla Angah
 
Designing With Visually Impaired People: NetSquared Cambridge
Designing With Visually Impaired People: NetSquared CambridgeDesigning With Visually Impaired People: NetSquared Cambridge
Designing With Visually Impaired People: NetSquared CambridgeNetSquared
 
Visual impairment
Visual impairmentVisual impairment
Visual impairmentZaenul Wafa
 
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLIND
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLINDDEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLIND
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLINDeadvisor
 
Importance of orientation and mobility training
Importance of orientation and mobility trainingImportance of orientation and mobility training
Importance of orientation and mobility trainingMonkey!
 
LEARNING MEDIA ASSESSMENT
LEARNING MEDIA ASSESSMENTLEARNING MEDIA ASSESSMENT
LEARNING MEDIA ASSESSMENTeadvisor
 
visual impairment
visual impairmentvisual impairment
visual impairmentwajiha b
 
Famous blind people
Famous blind peopleFamous blind people
Famous blind peopleTanya
 

En vedette (20)

Sensor Based Blind Stick
Sensor Based Blind StickSensor Based Blind Stick
Sensor Based Blind Stick
 
Electronic Walking Stick
Electronic Walking StickElectronic Walking Stick
Electronic Walking Stick
 
Latest ECE Projects Ideas In Various Electronics Technologies
Latest ECE Projects Ideas In Various Electronics TechnologiesLatest ECE Projects Ideas In Various Electronics Technologies
Latest ECE Projects Ideas In Various Electronics Technologies
 
Maths Final Project Brief
Maths Final Project BriefMaths Final Project Brief
Maths Final Project Brief
 
Blobby DSAI
Blobby DSAIBlobby DSAI
Blobby DSAI
 
Low incidence disabilities
Low incidence disabilitiesLow incidence disabilities
Low incidence disabilities
 
Pre Braille Skills And Fine Motor Development
Pre Braille Skills And Fine Motor DevelopmentPre Braille Skills And Fine Motor Development
Pre Braille Skills And Fine Motor Development
 
Understanding Visually Impaired Students
Understanding Visually Impaired StudentsUnderstanding Visually Impaired Students
Understanding Visually Impaired Students
 
Ruchir Mishra Resume.
Ruchir Mishra  Resume.Ruchir Mishra  Resume.
Ruchir Mishra Resume.
 
ORIENTATION AND MOBILITY FOR BLIND PERSON
ORIENTATION AND MOBILITY FOR BLIND PERSONORIENTATION AND MOBILITY FOR BLIND PERSON
ORIENTATION AND MOBILITY FOR BLIND PERSON
 
Guidelines for working with student who are blind or visually impaired
Guidelines for working with student who are blind or visually impairedGuidelines for working with student who are blind or visually impaired
Guidelines for working with student who are blind or visually impaired
 
Designing With Visually Impaired People: NetSquared Cambridge
Designing With Visually Impaired People: NetSquared CambridgeDesigning With Visually Impaired People: NetSquared Cambridge
Designing With Visually Impaired People: NetSquared Cambridge
 
Visual impairment
Visual impairmentVisual impairment
Visual impairment
 
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLIND
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLINDDEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLIND
DEVELOPMENTAL ASSESSMENT: ROLE OF THE TEACHER OF VISUALLY IMPAIRED/DEAFBLIND
 
Importance of orientation and mobility training
Importance of orientation and mobility trainingImportance of orientation and mobility training
Importance of orientation and mobility training
 
LEARNING MEDIA ASSESSMENT
LEARNING MEDIA ASSESSMENTLEARNING MEDIA ASSESSMENT
LEARNING MEDIA ASSESSMENT
 
HARIS KHALID KHAN
HARIS KHALID KHANHARIS KHALID KHAN
HARIS KHALID KHAN
 
Basic Skills Orientation
Basic Skills OrientationBasic Skills Orientation
Basic Skills Orientation
 
visual impairment
visual impairmentvisual impairment
visual impairment
 
Famous blind people
Famous blind peopleFamous blind people
Famous blind people
 

Similaire à Smart blind stick book

RAPA Project Documentaion
RAPA Project DocumentaionRAPA Project Documentaion
RAPA Project DocumentaionKhaled El Sawy
 
Know Your Teacher(KYT)
Know Your Teacher(KYT)Know Your Teacher(KYT)
Know Your Teacher(KYT)Ashwani Kumar
 
Mert CV English
Mert CV EnglishMert CV English
Mert CV Englishmattayll
 
Mert cv english
Mert cv englishMert cv english
Mert cv englishmattayll
 
Mert cv english
Mert cv englishMert cv english
Mert cv englishmattayll
 
Final year project report final sem.pdf
Final year project report final sem.pdfFinal year project report final sem.pdf
Final year project report final sem.pdfAmaanAhmed55
 
Final year report on remote control of home appliances via bluetooth
Final year report on remote control of home appliances via bluetoothFinal year report on remote control of home appliances via bluetooth
Final year report on remote control of home appliances via bluetoothShubham Bhattacharya
 
Arduino Based smart security system for Women And Children safety project.pptx
Arduino Based smart security system for Women And Children safety project.pptxArduino Based smart security system for Women And Children safety project.pptx
Arduino Based smart security system for Women And Children safety project.pptxNeeteshKumar66
 
Miniproject Report.pdf
Miniproject Report.pdfMiniproject Report.pdf
Miniproject Report.pdfVedaantDutt1
 
IRJET - Women Safety Device with GPS Tracking and Alerts
IRJET -  	  Women Safety Device with GPS Tracking and AlertsIRJET -  	  Women Safety Device with GPS Tracking and Alerts
IRJET - Women Safety Device with GPS Tracking and AlertsIRJET Journal
 
IRJET - Automated Fear Detection using Smart Watch Sensors for Women Safety
IRJET -  	  Automated Fear Detection using Smart Watch Sensors for Women SafetyIRJET -  	  Automated Fear Detection using Smart Watch Sensors for Women Safety
IRJET - Automated Fear Detection using Smart Watch Sensors for Women SafetyIRJET Journal
 
Gsm based home security system
Gsm based home security systemGsm based home security system
Gsm based home security systemNarayan Gour
 
smart_self_defence_device.PDF
smart_self_defence_device.PDFsmart_self_defence_device.PDF
smart_self_defence_device.PDFAnujna H M
 
IEM ECE Electrovision 2013
IEM ECE Electrovision 2013IEM ECE Electrovision 2013
IEM ECE Electrovision 2013agomoni16
 
IRJET- Intelligent Security System for Women by using Arduino
IRJET- Intelligent Security System for Women by using ArduinoIRJET- Intelligent Security System for Women by using Arduino
IRJET- Intelligent Security System for Women by using ArduinoIRJET Journal
 
Agricultural environment control system using wireless sensor networks
Agricultural environment control system using wireless sensor networksAgricultural environment control system using wireless sensor networks
Agricultural environment control system using wireless sensor networksAhmed Fawzy
 

Similaire à Smart blind stick book (20)

RAPA Project Documentaion
RAPA Project DocumentaionRAPA Project Documentaion
RAPA Project Documentaion
 
Know Your Teacher(KYT)
Know Your Teacher(KYT)Know Your Teacher(KYT)
Know Your Teacher(KYT)
 
Mert CV English
Mert CV EnglishMert CV English
Mert CV English
 
Mert cv english
Mert cv englishMert cv english
Mert cv english
 
Mert cv english
Mert cv englishMert cv english
Mert cv english
 
Harsh varia
Harsh variaHarsh varia
Harsh varia
 
Final year project report final sem.pdf
Final year project report final sem.pdfFinal year project report final sem.pdf
Final year project report final sem.pdf
 
Intelligent Controlled Residence and Face Recognition Technology
Intelligent Controlled Residence and Face Recognition TechnologyIntelligent Controlled Residence and Face Recognition Technology
Intelligent Controlled Residence and Face Recognition Technology
 
Final year report on remote control of home appliances via bluetooth
Final year report on remote control of home appliances via bluetoothFinal year report on remote control of home appliances via bluetooth
Final year report on remote control of home appliances via bluetooth
 
Arduino Based smart security system for Women And Children safety project.pptx
Arduino Based smart security system for Women And Children safety project.pptxArduino Based smart security system for Women And Children safety project.pptx
Arduino Based smart security system for Women And Children safety project.pptx
 
Miniproject Report.pdf
Miniproject Report.pdfMiniproject Report.pdf
Miniproject Report.pdf
 
IRJET - Women Safety Device with GPS Tracking and Alerts
IRJET -  	  Women Safety Device with GPS Tracking and AlertsIRJET -  	  Women Safety Device with GPS Tracking and Alerts
IRJET - Women Safety Device with GPS Tracking and Alerts
 
IRJET - Automated Fear Detection using Smart Watch Sensors for Women Safety
IRJET -  	  Automated Fear Detection using Smart Watch Sensors for Women SafetyIRJET -  	  Automated Fear Detection using Smart Watch Sensors for Women Safety
IRJET - Automated Fear Detection using Smart Watch Sensors for Women Safety
 
Android assistant eye mate for blind and blindtracker
Android assistant eye mate for blind and blindtrackerAndroid assistant eye mate for blind and blindtracker
Android assistant eye mate for blind and blindtracker
 
Gsm based home security system
Gsm based home security systemGsm based home security system
Gsm based home security system
 
smart_self_defence_device.PDF
smart_self_defence_device.PDFsmart_self_defence_device.PDF
smart_self_defence_device.PDF
 
IEM ECE Electrovision 2013
IEM ECE Electrovision 2013IEM ECE Electrovision 2013
IEM ECE Electrovision 2013
 
IRJET- Intelligent Security System for Women by using Arduino
IRJET- Intelligent Security System for Women by using ArduinoIRJET- Intelligent Security System for Women by using Arduino
IRJET- Intelligent Security System for Women by using Arduino
 
Final Book
Final BookFinal Book
Final Book
 
Agricultural environment control system using wireless sensor networks
Agricultural environment control system using wireless sensor networksAgricultural environment control system using wireless sensor networks
Agricultural environment control system using wireless sensor networks
 

Dernier

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 

Dernier (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Smart blind stick book

  • 1. Mansoura University Faculty of Engineering Dept. of Electronics and Communication Engineering Smart Blind Stick A B. Sc. Project in Electronics and Communications Engineering Supervised by Assist. Prof. Mohamed Abdel-Azim Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashraf Department of Electronics and Communications Engineering Faculty of Engineering-Mansoura University 2011-2012
  • 2. Mansoura University Faculty of Engineering Dept. of Electronics and Comm. Engineering Smart Blind Stick A B. Sc. Project in Electronics and Communications Engineering Supervised by Assist. Prof. Mohamed Abdel-Azim Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashrsf Department of Electronics and Communications Engineering Faculty of Engineering-Mansoura University 2011-2012
  • 3. Team Work Team Work No. Name Contact Information 1 Ahmed Helmy Abd-Ghaffar Ahmed2033@gmail.com 2 Nesma Zein El-Abdeen Mohammed eng_nesma.zein@yahoo.com 3 Aya Gamal Osman El-Mansy eng_tota_20@hotmail.com 4 Fatma Ayman Mohammed angel_whisper89@hotmail.com 5 Ahmed Moawad Abo-Elenin Awad ahmedmowd@gmail.com i
  • 4. Acknowledgement Acknowledgement We would like to express our gratitude to our advisor and supervisor Dr. Mohammed Abd ElAzim for guiding this work with interest. We would like to also thank Eng. Ahmed Shaaban and Eng. Mohammed Gamal and Eng. Eman Ashraf, Teaching Assistance for the countless hours he spent in the labs. We are grateful to them for setting high standards and giving us the freedom to explore. We would like to thank our colleagues for the assistance and constant support provided by them. Our Team ii
  • 6. Abstract Abstract There is approximately 36.9 million people in the world are blind in 2002 according to World Health Organization. Majority of them are using a conventional white cane to aid in navigation. The limitation in white cane is that the information’s are gained by touching the objects by the tip of the cane. The traditional length of a white cane depends on the height of user and it extends from the floor to the person’s sternum. So we'll design ultrasound sensor to detect all kinds of barriers whatever its shape or height and warn him with vibration. Blind people also face great problems in moving from place to another in the town and the only way for them is Guide dogs which can cost about $20, 000 and they can be useful for about 5 – 6 years. So we'll design GPS for blind people which help him in moving from place to another in the town with voice orders for directions and he'll identify the place he want to go with voice only and not need to type any thing. But we want also to help him in moving indoor or in closed places he goes daily from place to another we'll design an indoor navigation system depend on working off line to help him to move from location to another in specific places home, moles, libraries...Etc. also by voice orders . He may face a great problem in control his electric devices we'll design for him a total wireless control system to easily control all his electric devices by voice connected to a security system to warn him if he indoor or out if any thing wrong happen and help him to solve this problem . iv
  • 7. Contents Chapter-01: Introduc on……………………………………………………………………………………………….. 1.1 Problem Definition …………………………………………………………………………………...... 1.2 Problem Solution …………………………………………………………………………………………. 1.3 Business Model ……………………………………………………………………………………………. 1.4 Block Diagram………………………………………………………………………………………………. 1.5 Detailed Technical Description ……………………….…………………………………………… 1.6 Pre-Project Planning….…………………………………………………………………………………. 1.7 Time Planning………………………………………………………………………………………………. Chapter-02: Speech recognition ………………………………………………………………………………………… 2.1 Introduction ………………………………………………………………………………………………… 2.2 Literature review …………………………………………………………………………………………. 2.2.1 Pattern recognition ………………………………………………………. 2.2.2 Generation of voice ……………………………………………………… 2.2.3 Voice as biometric ………………………………………………………… 2.2.4 Speech recognition ………………………………………………………. 2.2.5 Speaker recognition ……………………………………………………… 2.2.6 Speechspeaker modeling …………………………………………….. 2.3 Implementation details ……………………………………………………………………………….. 2.3.1 Pre-processing and feature extraction …………………………… 2.4 Artificial neural network……………………………………………………............................. 2.4.1 Introduction ………………………………………………………………….. 2.4.2 Models ………………………………………………………………………….. 2.4.3 Network function …………………………………………………………... 2.4.4 Ann dependency graph ………………………………………………….. 2.4.5 Learning …………………………………………………………………………. 2.4.6 Choosing a cost function ……………………………………………….. 2.4.7 Learning paradigms ……………………………………………………….. 2.4.8 Supervised learning ……………………………………………………….. 2.4.9 unsupervised learning ……………………………………………………. 2.4.10 Reinforcement learning …………………………………………………. 2.4.11 Learning algorithms………………………………………………………… 2.4.12 Employing artificial neural network ……………………………….. 2.4.13 Application …………………………………………………………………….. 2.4.14 Types of models …………………………………………………………….. 2.4.15 Neural network software ………………………………………………. 2.4.16 Types of artificial neural network ………………………………….. 2.4.17 Confidence analysis of neural network ………………………….. Chapter-03: Image Processing ………….…………………………………………………………………………….. 3.1 Introduction …………………………………………………………………………………………………. 3.1.1 What is digital image processing? ...................................... 3.1.2 Motivating problems ……………………………………………………… 3.2 Color vision ………………………………………………………………………………………………….. 3.2.1 Fundamentals ………………………………………………………………… 3.2.2 Image formats supported by mat lab …………………………….. 3.2.3 Working formats in mat lab …………………………………………… 3.3 Aspects of image processing ……………………………………………………………………….. ii 1 1 1 2 2 3 4 4 7 7 7 7 9 11 11 12 13 13 13 22 22 23 24 24 25 26 26 26 27 27 28 28 29 30 31 31 31 32 33 33 33 34 34 35 35 35
  • 8. Contents 3.4 Image types …………………………………………………………………………………………………. 3.4.1 Intensity image ……………………………………………………………… 3.4.2 Binary image …………………………………………………………………. 3.4.3 Indexed image ………………………………………………………………. 3.4.4 RGB image……………………………………………………………………… 3.4.5 Multi frame image …………………………………………………………. 3.5 How to …………………………………………………………………………………………………………. 3.5.1 How to convert between different formats …………………… 3.5.2 How to read file …………………………………………………………….. 3.5.3 Loading and saving variables in mat lab …………………………. 3.5.4 How to display an image in mat lab ……………………………….. 3.6 Some important definitions …………………………………………………………………………. 3.6.1 Imread function …………………………………………………………….. 3.6.2 Rotation ………………………………………………………………………… 3.6.3 Scaling …………………………………………………………………………… 3.6.4 Interpolation …………………………………………………………………. 3.7 Edge detection …………………………………………………………………………………………….. 3.7.1 Canny edge detection ……………………………………………………. 3.7.2 Edge tracing …………………………………………………………………… 3.8 Mapping ………………………………………………………………………………………………………. 3.8.1 Mapping image onto surface overview ………………………….. 3.8.2 Mapping an image onto elevation data …………………………. 3.8.3 Initializing the IDL display objects…………………………………… 3.8.4 Displaying image and geometric surface object……………… 3.8.5 Mapping an image onto sphere……………………………………… 3.9 Mapping offline……………………………………………………………………………………………. Chapter-04: GPS naviga on………………………………………………………………………………………….. 4.1 Introduction ………………………………………………………………………………………………… 4.1.1 What is GPS ?...................................................................... 4.1.2 How it work ?...................................................................... 4.2 Basic concepts of GPS ………………………………………………………………………………….. 4.3 Position calculation ……………………………………………………………………………………… 4.4 Communication …………………………………………………………………………………………… 4.5 Message format ………………………………………………………………………………………….. 4.6 Satellite frequencies ……………………………………………………………………………………. 4.7 Navigation equations ………………………………………………………………………………….. 4.8 Bancroft's method ……………………………………………………………………………………….. 4.9 Trilateration …………………………………………………………………………………………………. 4.10 Multidimensional Newton-Raphson calculation …………………………………………. 4.11 Additional method for more than four satellites ………………………….................. 4.12 Error sources and analysis …………………………………………………………………………… 4.13 Accuracy enhancement and surveying ………………………………………………………… 4.13.1 Augmentation………………………………………………………………… 4.13.2 Precise monitoring…………………………………………………………. 4.14 Time keeping ………………………………………………………………………………………………. 4.14.1 Time keeping and leap seconds …………………………………….. iii 36 36 37 37 37 37 38 38 38 39 39 40 40 40 41 41 41 41 42 43 43 44 46 47 51 51 53 53 53 53 54 55 57 57 58 59 60 60 60 61 61 61 61 62 63 63
  • 9. Contents 4.14.2 Time keeping accuracy …………………………………………………… 4.14.3 Time keeping format………………………………………………………. 4.14.4 Carrier phase tracking ……………………………………………………. 4.15 GPS navigation …………………………………………………………………………………………….. Chapter-05: Ultrasound ……………………………………………………………………………………………. 5.1 Introduction …………………………………………………………………………………………………. 5.1.1 History ……………………………………………………………………………. 5.2 Wave motion ……………………………………………………………………………………………….. 5.3 Wave characteristics ……………………………………………………………………………………. 5.4 Ultrasound intensity …………………………………………………………………………………….. 5.5 Ultrasound velocity ……………………………………………………………………………………… 5.6 Attenuation of ultrasound …………………………………………………………………………… 5.7 Reflection ……………………………………………………………………………………………………. 5.8 Refraction ……………………………………………………………………………………………………. 5.9 Absorption …………………………………………………………………………………………………. 5.10 Hardware part …………………………………………………………………………………… 5.10.1 Introduction ………………………………………………………….. 5.10.2 Calculating the distance…………………………………………. 5.10.3 Changing beam pattern and beam width …………………. 5.10.4 The development of the sensor………………………………… Chapter-06: Microcontroller ………………………………………………………………………………………. 6.1 Introduction …………………………………………………………………………………….. 6.1.1 History of microcontroller ……………………………………… 6.1.2 Embedded design…………………………………………………….. 6.1.3 Interrupt …………………………………………………………………. 6.1.4 Programs ………………………………………………………………… 6.1.5 Other microcontroller feature ……………………………….. 6.1.6 Higher integration ……………………………………………………. 6.1.7 Programming environment ……………………………………… 6.2 Types of micro controller …………………………………………………………………. 6.2.1 Interrupt latency ………………………………………………………. 6.3 Microcontroller embedded memory technology ………………………… 6.3.1 Data……………………………………………………………………….. 6.3.2 Firmware ………………………………………………………………… 6.4 PIC microcontroller ………………………………………………………………………….. 6.4.1 Family core architecture ……………………………………….. 6.5 PIC component ………………………………………………………………………………….. 6.5.1 Logic circuit ……………………………………………………………… 6.5.2 Power supply …………………………………………………………… 6.6 Development tools…………………………………………………………………………… 6.6.1 Device programs …………………………………………………….. 6.6.2 Debugging ………………………………………………………………. 6.7 LCD display ……………………………………………………………………………………….. 6.7.1 LCD display pins ………………………………………………………. 6.7.2 LCD screen ……………………………………………………………… 6.7.3 LCD memory ………………………………………………………….. iv 63 64 64 66 69 69 69 69 71 72 75 76 77 79 81 83 83 87 87 88 91 91 92 93 93 94 94 95 97 98 99 100 100 101 101 101 101 106 119 127 127 128 130 131 131 132
  • 10. Contents 6.7.4 LCD basic command ………………………………………………….. 6.7.5 LCD connecting …………………………………………………………. 6.7.6 LCD initialization ……………………………………………………… Chapter-07: System Implementa on ………………………………………………………………………… 7.1 Introduction ……………………………………………………………………………………… 7.2 Survey……………………………………………………………………………………………….. 7.3 Searches …………………………………………………………………………………………… 7.3.1 Ultra sound sensor……………………………………………………. 7.3.2 Indoor navigation systems ………………………………………. 7.3.3 Outdoor navigation ………………………………………………… 7.4 Sponsors ……………………………………………………………………………………….. 7.5 Pre-design ………………………………………………………………………………………. 7.5.1 List of matrices ………………………………………………………. 7.5.2 Competitive Benchmarking Information………… 136 138 139 141 141 142 142 142 142 142 143 143 144 145 7.5.3 Ideal and marginally acceptable target values ……….. 7.5.4 Time plan diagram …………………………………………………… 7.6 Design ……………………………………………………………………………………………… 7.6.1 Speech recognition ……………………………………………….. 7.6.2 Ultra sensors …………………………………………………………… 7.6.3 Outdoor navigation ………………………………………………… 7.7 Product architecture ……………………………………………………………………… 7.7.1 Product schematic ………………………………………………….. 7.7.2 Rough geometric layout …………………………………………. 7.7.3 Incidental interactions …………………………………………….. 7.8 Defining secondary system …………………………………………………………….. 7.9 Detailed interface specification ……………………………………………………… 7.10 Establishing the architecture of the chunks ……………………………………… Chapter-08: conclusion ………………………………………………………………………………………………. 8.1 Introduction…………………………………………………………………………………. 8.2 Overview………………………………………………………………………………………….. 8.2.1 Outdoor navigation …………………………………………………… 8.2.1 8.2.1.1 Outdoor navigation online ……………………………………… 8.2.1.2 Outdoor navigation offline ………………………………………. 8.2 8.2.2 Ultrasound sensor …………………………………………………….. 8.2.3 Object identifier ………………………………………………………. 8.3 Features ……………………………………………………………………………………………. 146 146 147 147 149 150 151 151 152 153 154 154 155 157 158 158 158 158 158 159 159 159 v
  • 12. Chapter 1 | Introduction 1.1 | PROBLEM DEFINITION There is approximately 36.9 million people in the world are blind in 2002 according to World Health Organization. Majority of them are using a conventional white cane to aid in navigation. The limitation in white cane is that the information’s are gained by touching the objects by the tip of the cane. The traditional length of a white cane depends on the height of user and it extends from the floor to the person’s sternum. Blind people also face great problems in moving from place to another in the town and the only way for them is Guide dogs which can cost about $20, 000 and they can be useful for about 5 – 6 years. They also have a great problem to identify the objects he frequently used in his house as kitchen tools and clothes. And also he may face a great problem in control his electric devices or have a security problem and he can't face it. 1.2 | PROBLEM SOLUTION All previous problems we're trying to solve them. To help the user moving easily indoor and outdoor we'll use ultrasound sensor to detect the barriers on his way and alert him by 2 ways vibration motor which speed increases when the distance decreases and voice alert told him the distance between him and the barrier. To solve the problem of moving outside home from place to another we'll design a software to be used in smart phones to help him in moving from place to another with voice orders without any external help he just say the place he want to go then the phone will guide him with voice orders to arrive this place. To help him to identify the objects we'll use RFID every important object will have tag or id when the reader read the id it will told him what it is by voice. Inside the home we'll design a system to control all electronic devices by voice orders and also a security system designed especially for them the most important in it is the fire alarm when it detects a fire it will alert him by a call to his mobile phone and another call to his friends near him for help and also a security system to warn him if he forget to close his door. After finishing these applications we're going to make features after graduation by adding new technologies to help him moving in the street easier and help him crossing roads and reading books. The products in our market in Egypt for them don't cover any needs for them. 1
  • 13. Chapter 1 | Introduction The blind needs to move control and do his tasks his self without any help from anybody. There’s just a white stick without any technologies or features. So finally we'll install on the white stick a sensor and RFID and the other part is a software part on the mobile to do the navigation and automation tasks. 1.3 | BUSINESS MODEL Our customers are blind people and a visually impaired person there's almost 1 million people in Egypt has one of the past problems. Our product would cover some needs of our customers as helping them to avoid the barriers on their way and guide them with voice to the direction they must go to avoid this and also help the to move free without any external help in different countries by android application on his mobile which designed especially for them to guide them with voice through roads and tell them the direction they have to go to arrive their goal. To reach our goal we met with different customers to know exactly what they need and help us to get a vision for our final product to be comfortable and also we were guided technically by our sponsors to find the best way to cover all these needs. In our market the available products doesn't cover any needs we just found a white stick without any technologies to help the user. 1.4 | BLOCK DIAGRAMS Fig.(1.1): General Project Block Diagram 2
  • 14. Chapter 1 | Introduction 1.5 DETAILED TECHNICAL DESCRIPTION Our project was built on the simplest available technologies to reach our goal in the way that comfort the user so we divided our project into 2 parts software and hardware. The hardware part consists of MCU pic, MP3 module, cam module and ultrasound sensor module. The software part is an android application available to be installed on the mobile. In the hardware part there're 2 conditions for it indoor and outdoor. For indoor only one sensor will measure ranges and cam module will take a photo to the object when the user reaches 2 cm to detect the code put on and send it to MCU which processing it and identify the code number and then get the object name from database and then connect the mp3 module WT588D and get the mp3 file address which contains the name of it and out to from the speaker. For outdoor 3 sensors HC-SR04 sensors will be activated in 3 direction to determine the best way no barriers on it and send measured data to MCU and the MCU detect the best way and send the address of the mp3 which contains the wanted direction and it would be the output. For navigation outdoor we'll design android application using Google maps the user detect the place he want to go with voice and the application detect his current position using GPS and the digital compass detect the angel of view and guide him to the direction using GPS data and compass data. Choose Mode Left Button Right Button Outdoor Indoor Fig. (1.2): Button Configuration 3
  • 15. Chapter 1 | Introduction Fig. (1.3): Indoor & Outdoor Processes Block Diagram 1.6 PRE-PROJECT PLANNING We start searching for a problem no one care it and we found blinds' problems take no care to be solved and available products in Egypt aren't found. So we found it's a good field to start in it to get an opportunity to solve a problem and also enter a new field in the market with low number of Competitors. 1.7 TIME PLANNING Project Timing: The three main parts are individual in execution time but each part has many branches which are series in execution time. Timing of Product Introductions: The timing of launching the product is dependent on the marketing and the market studying again to the products which must be having low cost and high quality. 4
  • 16. Chapter 1 | Introduction Technology Readiness: One of the fundamental components in the product is technology because the Android and Ultrasonic technology are taking good importance between the Egyptian customers. Market Readiness: The market always has a readiness to any new product the market is common between products to give the customers the best one for them. The Product Plan: This plan makes the project comfortable in his implementation because anything arranged or planned to do give the best results. 5
  • 18. Chapter 2 | Speech Recognition 2.1 | INTRODUCTION Biometrics is, in the simplest definition, something you are. It is a physical characteristic unique to each individual such as fingerprint, retina, iris, speech. Biometrics has a very useful application in security; it can be used to authenticate a person’s identity and control access to a restricted area, based on the premise that the set of these physical characteristics can be used to uniquely identify individuals. Speech signal conveys two important types of information, the primarily the speech content and on the secondary level, the speaker identity. Speech recognizers aim to extract the lexical information from the speech signal independently of the speaker by reducing the inter-speaker variability. On the other hand, speaker recognition is concerned with extracting the identity of the person speaking the utterance. So both speech recognition and speaker recognition system is possible from same voice input. We use in our project the speech recognition technique because we want in our project to recognize the word that the stick will make action depending on this word. Mel Filter Cepstral Coefficient (MFCC) is used as feature for both speech and speaker recognition. We also combined energy features and delta and delta-delta features of energy and MFCC. After calculating feature, neural networks are used to model the speech recognition. Based on the speech model the system decides whether or not the uttered speech matches what was prompted to utter. 2.2 | LITERATURE REVIEW 2.2.1 | Pattern Recognition Pattern recognition, one of the branches of artificial intelligence, sub-section of machine learning, is the study of how machines can observe the environment, learn to distinguish patterns of interest from their background, and make sound and reasonable decisions about the categories of the patterns. A pattern can be a fingerprint image, a handwritten cursive word, a human face, or a speech signal, sales pattern etc… The applications of pattern recognition include data mining, document classification, financial forecasting, organization and retrieval of multimedia databases, and biometrics (personal identification based on various physical attributes such as face, retina, speech, ear and fingerprints).The essential steps of 7
  • 19. Chapter 2 | Speech Recognition pattern recognition are: Data Acquisition, Preprocessing, Feature Extraction, Training and Classification. Features are used to denote the descriptor. Features must be selected so that they are discriminative and invariant. They can be represented as a vector, matrix, tree, graph, or string. They are ideally similar for objects in the same class and very different for objects indifferent class. Pattern class is a family of patterns that share some common properties. Pattern recognition by machine involves techniques for assigning patterns to their respective classes automatically and with as little human intervention as possible. Learning and Classification usually use one of the following approaches: Statistical Pattern Recognition is based on statistical characterizations of patterns, assuming that the patterns are generated by a probabilistic system. Syntactical (or Structural) Pattern Recognition is based on the structural interrelationships of features. Given a pattern, its recognition/classification may consist of one of the following two tasks according to the type of learning procedure: 1) Supervised Classification (e.g., Discriminant Analysis) in which the input pattern is identified as a member of a predefined class. 2) Unsupervised Classification (e.g., clustering) in which the pattern is assigned to a previously unknown class. Fig. (2.1): General block diagram of pattern recognition system 8
  • 20. Chapter 2 | Speech Recognition 2.2.2 | Generation of Voice Speech begins with the generation of an airstream, usually by the lungs and diaphragm -process called initiation. This air then passes through the larynx tube, where it is modulated by the glottis (vocal chords). This step is called phonation or voicing, and is responsible fourth generation of pitch and tone. Finally, the modulated air is filtered by the mouth, nose, and throat - a process called articulation - and the resultant pressure wave excites the air. Fig. (2.2): Vocal Schematic Depending upon the positions of the various articulators different sounds are produced. Position of articulators can be modeled by linear time- invariant system that has frequency response characterized by several peaks called formants. The change in frequency of formants characterizes the phoneme being articulated. As a consequence of this physiology, we can notice several characteristics of the frequency domain spectrum of speech. First of all, the oscillation of the glottis 9
  • 21. Chapter 2 | Speech Recognition results in an underlying fundamental frequency and a series of harmonics at multiples of this fundamental. This is shown in the figure below, where we have plotted a brief audio waveform for the phoneme /i: / and its magnitude spectrum. The fundamental frequency (180 Hz) and its harmonics appear as spikes in the spectrum. The location of the fundamental frequency is speaker dependent, and is a function of the dimensions and tension of the vocal chords. For adults it usually falls between 100 Hz and 250 Hz, and females‟ average significantly higher than that of males. Fig. (2.3): Audio Sample for /i: / phoneme showing stationary property of phonemes for a short period The sound comes out in phonemes which are the building blocks of speech. Each phoneme resonates at a fundamental frequency and harmonics of it and thus has high energy at those frequencies in other words have different formats. It is the feature that enables the identification of each phoneme at the recognition stage. The variations in Fig.(2.4): Audio Magnitude Spectrum for /i:/ phoneme showing fundamental frequency and its harmonics 10
  • 22. Chapter 2 | Speech Recognition Inter-speaker features of speech signal during utterance of a word are modeled in word training in speech recognition. And for speaker recognition the intra-speaker variations in features in long speech content is modeled. Besides the configuration of articulators, the acoustic manifestation of a phoneme is affected by:  Physiology and emotional state of speaker.  Phonetic context.  Accent. 2.2.3 | Voice as Biometric The underlying premise for voice authentication is that each person’s voice differs in pitch, tone, and volume enough to make it uniquely distinguishable. Several factors contribute to this uniqueness: size and shape of the mouth, throat, nose, and teeth (articulators) and the size, shape, and tension of the vocal cords. The chance that all of these are exactly the same in any two people is very low. Voice Biometric has following advantages from other form of biometrics:  Natural signal to produce  Implementation cost is low since, doesn’t require specialized input device  Acceptable by user Easily mixed with other form of authentication system for multifactor authentication only biometric that allows users to authenticate remotely. 2.2.4 | Speech Recognition Speech is the dominant means for communication between humans, and promises to be important for communication between humans and machines, if it can just be made a little more reliable. Speech recognition is the process of converting an acoustic signal to a set of words. The applications include voice commands and control, data entry, voice user interface, automating the telephone operator’s job in telephony, etc. They can also serve as the input to natural language processing. There is two variant of speech recognition based on the duration of speech signal: Isolated word recognition, in which each word is surrounded by some sort of pause, is much easier than recognizing continuous speech, in which words run into each other and have to be segmented. Speech recognition is a difficult task because 11
  • 23. Chapter 2 | Speech Recognition of the many source of variability associated with the signal such as the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context. Acoustic variability can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within speaker variability can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Finally, differences in socio linguistic background, dialect, and vocal tract size and shape can contribute to cross-speaker variability. Such variability is modeled in various ways. At the level of signal representation, the representation that emphasizes the speaker independent features is developed. 2.2.5 | Speaker Recognition Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual’s information included in speech waves. Speaker recognition can be classified into identification and verification. Speaker recognition has been applied most often as means of biometric authentication. 2.2.5.1 | Types of Speaker Recognition Speaker Identification Speaker identification is the process of determining which registered speaker provides a given utterance. In Speaker Identification (SID) system, no identity claim is provided, the test utterance is scored against a set of known (registered) references for each potential speaker and the one whose model best matches the test utterance is selected. There is two types of speaker identification task closedset and open-set speaker identification .In closed-set, the test utterance belongs to one of the registered speakers. During testing, a matching score is estimated for each registered speaker. The speaker corresponding to the model with the best matching score is selected. This requires N comparisons for a population of N speakers. In open-set, any speaker can access the system; those who are not registered should be rejected. This requires another model referred to as garbage model or imposter model or background model, which is trained with data provided by other speakers different from the registered speakers. During testing, the matching score corresponding to the best speaker model is compared with the matching score estimated using the garbage model. In order to accept or reject the speaker, making the total number of comparisons equal to N + 12
  • 24. Chapter 2 | Speech Recognition 1. Speaker identification performance tends to decrease as the population size increases. Speaker verification Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. That is, the goal is to automatically accept or reject an identity that is claimed by the speaker. During testing, a verification score is estimated using the claimed speaker model and the anti-speaker model. This verification score is then compared to a threshold. If the score is higher than the threshold, the speaker is accepted, otherwise, the speaker is rejected. Thus, speaker verification, involves a hypothesis test requiring a simple binary decision: accept or reject the claimed identity regardless of the population size. Hence, the performance is quite independent of the population size, but it depends on the number of test utterances used to evaluate the performance of the system. 2.2.6 | Speaker/Speech Modeling There are various pattern modeling/matching techniques. They include Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Hidden Markov Modeling (HMM), Artificial Neural Network (ANN), and Vector Quantization (VQ). These are interchangeably used for speech, speaker modeling. The best approach is statistical learning methods: GMM for Speaker Recognition, which models the variations in features of a speaker for a long sequence of utterance. And another statistical method widely used for speech recognition is HMM. HMM models the Markovian nature of speech signal where each phoneme represents a state and sequence of such phonemes represents a word. Sequence of Features of such phonemes from different speakers is modeled by HMM. 2.3 | IMPLEMENTATION DETAILS The implementation of system includes common pre-processing and feature extraction module, speaker independent speech modeling and classification by ANNs. 2.3.1 | Pre-Processing and Feature Extraction 13
  • 25. Chapter 2 | Speech Recognition Starting from the capturing of audio signal, feature extraction consists of the following steps as shown in the block diagram below: Speech Signal Silence removal Preemphasis Framing Windowing DFT Mel Filter Bank Log IDF T CMS Energy 12MFCC 12 ΔMFCC 12 ΔΔ MFCC Delta 1 energy 1 Δ energy 1 ΔΔ energy Fig. (2.5): Pre-Processing and Feature Extraction 2.3.1.1 | Capture      The first step in processing speech is to convert the analog representation (first air pressure, and then analog electric signals in a microphone) into a digital signal x[n], where n is an index over time. Analysis of the audio spectrum shows that nearly all energy resides in the band between DC and 4 kHz, and beyond 10 kHz there is virtually no energy what so ever. Used sound format: 22050 Hz 16-bits, Signed Little Endian Mono Channel Uncompressed PCM 2.3.1.2 | End point detection and Silence removal The captured audio signal may contain silence at different positions such as beginning of signal, in between the words of a sentence, end of signal…. etc. If silent frames are included, modeling resources are spent on parts of the signal which do not contribute to the identification. The silence present must be removed before further processing. There are several ways for doing this: most popular are Short Time Energy and Zeros Crossing Rate. But they have their own limitation regarding setting thresholds as an ad hocbasis. The algorithm we used uses 14
  • 26. Chapter 2 | Speech Recognition statistical properties of background noise as well as physiological aspect of speech production and does not assume any ad hoc threshold. It assumes that background noise present in the utterances is Gaussian in nature. Usually first 200msec or more (we used 4410 samples for the sampling rate 22050samples/sec) of a speech recording corresponds to silence (or background noise) because the speaker takes some time to read when recording starts. Endpoint Detection Algorithm: Step 1: Calculate the mean (μ) and standard deviation (σ) of the first 200ms samples of the given utterance. The background noise is characterized by this μ and σ. Step 2: Go from 1st sample to the last sample of the speech recording. In each sample, check whether one-dimensional Mahalanobis distance functions i.e. | x-μ |/ σ greater than 3 or not. If Mahalanobis distance function is greater than 3, the sample is to be treated as voiced sample otherwise it is an unvoiced/silence. The threshold reject the samples up to 99.7% as per given by P [|x−μ|≤3σ] =0.997 in a Gaussian distribution thus accepting only the voiced samples. Step 3: Mark the voiced sample as 1 and unvoiced sample as 0. Divide the whole speech signal into 10 ms non-overlapping windows. Represent the complete speech by only zeros and ones. Step 4: Consider there are M number of zeros and N number of ones in a window. If M ≥ N then convert each of ones to zeros and vice versa. This method adopted here keeping in mind that a speech production system consisting of vocal cord, tongue, vocal tract etc. cannot change abruptly in a short period of time window taken here as 10ms. Step 5: Collect the voiced part only according to the labeled „1‟ samples from the windowed array and dump it in a new array. Retrieve the voiced part of the original speech signal from labeled 1 sample. 15
  • 27. Chapter 2 | Speech Recognition Fig. (2.6): Input signal to End-point detection system Fig. (2.7): Output signal from End point Detection System 2.3.1.3 | PCM Normalization The extracted pulse code modulated values of amplitude is normalized, to avoid amplitude variation during capturing. 2.3.1.4 | Pre-emphasis Usually speech signal is pre-emphasized before any further processing, if we look at the spectrum for voiced segments like vowels, there is more energy at lower frequencies than the higher frequencies. This drop in energy across frequencies is caused by the nature of the glottal pulse. Boosting the high frequency energy makes information from these higher formants more available to the acoustic model and improves phone detection accuracy. The pre-emphasis filter is a first-order high-pass filter. In the time domain, with input x[n]and 0.9 ≤ α ≤ 1.0, the filter equation is: y[n] = x[n]− α x[n−1] We used α=0.95. 16
  • 28. Chapter 2 | Speech Recognition Fig. (2.8): Signal before Pre-Emphasis Fig.(2.9): Signal after Pre-Emphasis 2.3.1.5 | Framing and windowing Speech is a non-stationary signal, meaning that its statistical properties are not constant across time. Instead, we want to extract spectral features from a small window of speech that characterizes a particular sub phone and for which we can make the (rough) assumption that the signal is stationary (i.e. its statistical properties are constant within this region).We used frame block of 23.22ms with 50% overlapping i.e., 512 samples per frame. 17
  • 29. Chapter 2 | Speech Recognition Fig.(2.10): Frame Blocking of the Signal The rectangular window (i.e., no window) can cause problems, when we do Fourier analysis; it abruptly cuts of the signal at its boundaries. A good window function has a narrow main lobe and low side lobe levels in their transfer functions, which shrinks the values of the signal toward zero at the window boundaries, avoiding discontinuities. The most commonly used window function in speech processing is the Hamming window defined as follows: ( ) ( ) { ( )} Fig.(2.11): Hamming window The extraction of the signal takes place by multiplying the value of the signal at time n, s frame [n], with the value of the window at time n, S w [n]: Y[n] = Sw[n] × Sframe[n] 18
  • 30. Chapter 2 | Speech Recognition Fig.(2.12): A single frame before and after windowing 2.3.1.6 | Discrete Fourier Transform A Discrete Fourier Transform (DFT) of the windowed signal is used to extract the frequency content (the spectrum) of the current frame. The tool for extracting spectral information i.e., how much energy the signal contains at discrete frequency bands for a discrete-time (sampled) signal is the Discrete Fourier Transform or DFT. The input to the DFT is a windowed signal x[n]...x[m], and the output, for each of N discrete frequency bands, is a complex number X[k] representing the magnitude and phase of that frequency component in the original signal. |∑ ( ) ( ) | The commonly used algorithm for computing the DFT is the Fast Fourier Transform or in short FFT. 2.3.1.7 | Mel Filter For calculating the MFCC, first, a transformation is applied according to the following formula: ( ) [ ] Where, x is the linear frequency. Then, a filter bank is applied to the amplitude of the Mel-scaled spectrum. The Mel frequency warping is most conveniently done by utilizing a filter bank with filters centered according to Mel 19
  • 31. Chapter 2 | Speech Recognition frequencies. The width of the triangular filters varies according to the Mel scale, so that the log total energy in a critical band around the center frequency is included. The centers of the filters are uniformly spaced in the Mel scale. Fig.(2.13): Equally spaced Mel values The result of Mel filter is information about distribution of energy at each Mel scale band. We obtain a vector of outputs (12 coeffs.) from each filter. Fig.(2.13): Triangular filter bank in frequency scale We have used 30 filters in the filter bank. 20
  • 32. Chapter 2 | Speech Recognition 2.3.1.8 | Cestrum by Inverse Discrete Fourier Transform Cestrum transform is applied to the filter outputs in order to obtain MFCC feature of each frame. The triangular filter outputs Y (i), i=0, 1, 2… M are compressed using logarithm, and discrete cosine transform (DCT) is applied. Here, M is equal to number of filters in filter bank i.e., 30. [ ] ∑ () [ ( )] Where, C[n] is the MFCC vector for each frame. The resulting vector is called the Mel-frequency cepstrum (MFC), and the individual components are the Mel-frequency Cepstral coefficients (MFCCs). We extracted 12 features from each speech frame. 2.3.1.9 | Post Processing Cepstral Mean Subtraction (CMS) A speech signal may be subjected to some channel noise when recorded, also referred to as the channel effect. A problem arises if the channel effect when recording training data for a given person is different from the channel effect in later recordings when the person uses the system. The problem is that a false distance between the training data and newly recorded data is introduced due to the different channel effects. The channel effect is eliminated by subtracting the Melcepstrum coefficients with the mean Mel-cepstrum coefficients: ( ) ( ) ∑ ( ) The energy feature The energy in a frame is the sum over time of the power of the samples in the frame; thus for a signal x in a window from time sample t1 to time sample t2 the energy is: ∑ [ ] Delta feature Another interesting fact about the speech signal is that it is not constant from frame to frame. Co-articulation (influence of a speech sound during another 21
  • 33. Chapter 2 | Speech Recognition adjacent or nearby speech sound) can provide a useful cue for phone identity. It can be preserved by using delta features. Velocity (delta) and acceleration (delta delta) coefficients are usually obtained from the static window based information. This delta and delta delta coefficients model the speed and acceleration of the variation of Cepstral feature vectors across adjacent windows. A simple way to compute deltas would be just to compute the difference between frames; thus the delta value d(t ) for a particular Cepstral value c (t) at time t can be estimated as: ( ) [] [] [] The differentiating method is simple, but since it acts as a high-pass filtering operation on the parameter domain, it tends to amplify noise. The solution to this is linear regression, i.e. first-order polynomial, the least squares solution is easily shown to be of the following form: [] ∑ [] ∑ Where, M is regression window size. We used M=4.       Composition of Feature Vector We calculated 39 Features from each frame: 12 MFCC Features. 12 Deltas MFCC. 12 Delta-Deltas MFCC. 1 Energy Feature. 1 Delta Energy Feature. 1 Delta-Delta Energy Feature. 2.4 | ARTIFICIAL NEURAL NETWORKS 2.4.1 | Introduction We have used ANNs to model our system and train voices and test it to classify it into words categories which return actions. And here we will make an overview about artificial neural networks. The original inspiration for the term Artificial Neural Network came from examination of central nervous systems and their neurons, axons, dendrites, and synapses, which constitute the processing elements of biological neural networks investigated by neuroscience. In an artificial neural network, simple artificial nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or 22
  • 34. Chapter 2 | Speech Recognition "units", are connected together to form a network of nodes mimicking the biological neural networks — hence the term "artificial neural network". Because neuroscience is still full of unanswered questions, and since there are many levels of abstraction and therefore many ways to take inspiration from the brain, there is no single formal definition of what an artificial neural network is. Generally, it involves a network of simple processing elements that exhibit complex global behavior determined by connections between processing elements and element parameters. While an artificial neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow. These networks are also similar to the biological neural networks in the sense that functions are performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks to which various units are assigned (see also connectionism). Currently, the term Artificial Neural Network (ANN) tends to refer mostly to neural network models employed in statistics, cognitive psychology and artificial intelligence. Neural network models designed with emulation of the central nervous system (CNS) in mind are a subject of theoretical neuroscience and computational neuroscience. In modern software implementations of artificial neural networks, the approach inspired by biology has been largely abandoned for a more practical approach based on statistics and signal processing. In some of these systems, neural networks or parts of neural networks (such as artificial neurons) are used as components in larger systems that combine both adaptive and non-adaptive elements. While the more general approach of such adaptive systems is more suitable for real-world problem solving, it has far less to do with the traditional artificial intelligence connectionist models. What they do have in common, however, is the principle of non-linear, distributed, parallel and local processing and adaptation. Historically, the use of neural networks models marked a paradigm shift in the late eighties from high-level (symbolic) artificial intelligence, characterized by expert systems with knowledge embodied in if-then rules, to lowlevel (sub-symbolic) machine learning, characterized by knowledge embodied in the parameters of a dynamical system. 2.4.2 | Models 23
  • 35. Chapter 2 | Speech Recognition Neural network models in artificial intelligence are usually referred to as artificial neural networks (ANNs); these are essentially simple mathematical models defining a function or a distribution over or both and , but sometimes models are also intimately associated with a particular learning algorithm or learning rule. A common use of the phrase ANN model really means the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons or their connectivity). 2.4.3 | Network Function The word network in the term 'artificial neural network' refers to the inter– connections between the neurons in the different layers of each system. An example system has three layers. The first layer has input neurons, which send data via synapses to the second layer of neurons, and then via more synapses to the third layer of output neurons. More complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons. The synapses store parameters called "weights" that manipulate the data in the calculations. An ANN is typically defined by three types of parameters:  The interconnection pattern between different layers of neurons  The learning process for updating the weights of the interconnections  The activation function that converts a neuron's weighted input to its output activation. Mathematically, a neuron's network function is defined as a composition of other functions, which can further be defined as a composition of other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables. A widely used type of composition is the nonlinear weighted sum, where (commonly referred to as the activation function) is some predefined function, such as the hyperbolic tangent. It will be convenient for the following to refer to a collection of functions as simply a vector. 2.4.4 | ANN dependency graph This figure depicts such a decomposition of , with dependencies between variables indicated by arrows. These can be interpreted in two ways. The first view is the functional view: the input is transformed into a 3dimensional vector , which is then transformed into a 2-dimensional vector , which is finally transformed into . This view is most commonly encountered in the context of optimization. 24
  • 36. Chapter 2 | Speech Recognition The second view is the probabilistic view: the random variable depends upon the random variable , which depends upon , which depends upon the random variable . This view is most commonly encountered in the context of graphical models. The two views are largely equivalent. In either case, for this particular network architecture, the components of individual layers are independent of each other (e.g., the components of are independent of each other given their input). This naturally enables a degree of parallelism in the implementation. Two separate depictions of the recurrent ANN dependency graph. Networks such as the previous one are commonly called feed forward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where is shown as being dependent upon itself. However, an implied temporal dependence is not shown. 2.4.5 | Learning What has attracted the most interest in neural networks is the possibility of learning. Given a specific task to solve, and a class of functions, learning means using a set of observations to find which solves the task in some optimal sense. This entails defining a cost function such that, for the optimal solution, - i.e., no solution has a cost less than the cost of the optimal solution (see Mathematical optimization). The cost function is an important concept in learning, as it is a measure of how far away a particular solution is from an optimal solution to the problem to be solved. Learning algorithms search through the solution space to find a function that has the smallest possible cost. For applications where the solution is dependent on some data, the cost must necessarily be a function of the observations; otherwise we would not be modeling anything related to the data. It is frequently defined as a statistic to which only approximations can be made. As a simple example, consider the problem of finding the model , which minimizes , for data pairs drawn from some distribution . In practical situations we would only have samples from and thus, for the above example, we would only minimize . Thus, the cost is minimized over a sample of the data rather than the entire data set. 25
  • 37. Chapter 2 | Speech Recognition When some form of online machine learning must be used, where the cost is partially minimized as each new example is seen. While online machine learning is often used when is fixed, it is most useful in the case where the distribution changes slowly over time. In neural network methods, some form of online machine learning is frequently used for finite datasets. 2.4.6 | Choosing a cost function While it is possible to define some arbitrary, ad hoc cost function, frequently a particular cost will be used, either because it has desirable properties (such as convexity) or because it arises naturally from a particular formulation of the problem (e.g., in a probabilistic formulation the posterior probability of the model can be used as an inverse cost). Ultimately, the cost function will depend on the desired task. An overview of the three main categories of learning tasks is provided below. 2.4.7 | Learning paradigms There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. 2.4.8 | Supervised learning In supervised learning, we are given a set of example pairs and the aim is to find a function in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain. A commonly used cost is the mean-squared error, which tries to minimize the average squared error between the network's output, f(x), and the target value y over all the example pairs. When one tries to minimize this cost using gradient descent for the class of neural networks called multilayer perceptron’s, one obtains the common and well-known back-propagation algorithm for training neural networks. Tasks that fall within the paradigm of supervised learning are pattern recognition (also known as classification) and regression (also known as function approximation). The supervised learning paradigm is also applicable to sequential 26
  • 38. Chapter 2 | Speech Recognition data (e.g., for speech and gesture recognition). This can be thought of as learning with a "teacher," in the form of a function that provides continuous feedback on the quality of solutions obtained thus far. 2.4.9 | Unsupervised learning In unsupervised learning, some data is given and the cost function to be minimized, that can be any function of the data and the network's output. The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables). As a trivial example, consider the model, where is a constant and the cost. Minimizing this cost will give us a value of that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it could be related to the mutual information between and, whereas in statistical modeling, it could be related to the posterior probability of the model given the data. (Note that in both of those examples those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general estimation problems; the applications include clustering, the estimation of statistical distributions, compression and filtering. 2.4.10 | Reinforcement learning In reinforcement learning, data are usually not given, but generated by an agent's interactions with the environment. At each point in time, the agent performs an action and the environment generates an observation and an instantaneous cost, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. More formally, the environment is modeled as a Markov decision process (MDP) with states and actions with the following probability distributions: the instantaneous cost distribution, the observation distribution and the transition, while a policy is defined as conditional distribution over actions given the observations. Taken together, the two define a Markov chain (MC). The aim is to 27
  • 39. Chapter 2 | Speech Recognition discover the policy that minimizes the cost; i.e., the MC for which the cost is minimal. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Dynamic programming has been coupled with ANNs (Neuro dynamic programming) by Bertsekas and Tsitsiklis and applied to multi-dimensional nonlinear problems such as those involved in vehicle routing or natural resources management because of the ability of ANNs to mitigate losses of accuracy even when reducing the discretization grid density for numerically approximating the solution of the original control problems. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks. 2.4.11 | Learning algorithms Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent. This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction. Evolutionary methods, simulated annealing, expectation-maximization, nonparametric methods and particle swarm optimization are some commonly used methods for training neural networks. 2.4.12 | Employing artificial neural networks Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism that 'learns' from observed data. However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential. Choice of model: This will depend on the data representation and the application. Overly complex models tend to lead to problems with learning. 28
  • 40. Chapter 2 | Speech Recognition Learning algorithm: There is numerous trades-offs between learning algorithms. Almost any algorithm will work well with the correct hyper parameters for training on a particular fixed data set. However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation. Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust. With the correct implementation, ANNs can be used naturally in online learning and large data set applications. Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware. 2.4.13 | Applications The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical. 2.4.13.1 | Real-life applications     The tasks artificial neural networks are applied to tend to fall within the following broad categories: Function approximation, or regression analysis, including time series prediction, fitness approximation and modeling. Classification, including pattern and sequence recognition, novelty detection and sequential decision making. Data processing, including filtering, clustering, blind source separation and compression. Robotics, including directing manipulators, Computer numerical control. Application areas include system identification and control (vehicle control, process control, natural resources management), quantum chemistry, game-playing and decision making (backgammon, chess, poker), pattern recognition (radar systems, face identification, object recognition and more), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial 29
  • 41. Chapter 2 | Speech Recognition applications (automated trading systems), data mining (or knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering. Artificial neural networks have also been used to diagnose several cancers. An ANN based hybrid lung cancer detection system named HLND improves the accuracy of diagnosis and the speed of lung cancer radiology. These networks have also been used to diagnose prostate cancer. The diagnoses can be used to make specific models taken from a large group of patients compared to information of one given patient. The models do not depend on assumptions about correlations of different variables. Colorectal cancer has also been predicted using the neural networks. Neural networks could predict the outcome for a patient with colorectal cancer with a lot more accuracy than the current clinical methods. After training, the networks could predict multiple patient outcomes from unrelated institutions. 2.4.13.2 | Neural networks and neuroscience Theoretical and computational neuroscience is the field concerned with the theoretical analysis and computational modeling of biological neural systems. Since neural systems are intimately related to cognitive processes and behavior, the field is closely related to cognitive and behavioral modeling. The aim of the field is to create models of biological neural systems in order to understand how biological systems work. To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning (biological neural network models) and theory (statistical learning theory and information theory). 2.4.14 | Types of models Many models are used in the field defined at different levels of abstraction and modeling different aspects of neural systems. They range from models of the short-term behavior of individual neurons, models of how the dynamics of neural circuitry arise from interactions between individual neurons and finally to models of how behavior can arise from abstract neural modules that represent complete subsystems. These include models of the long-term, and short-term plasticity, of neural systems and their relations to learning and memory from the individual neuron to the system level. 30
  • 42. Chapter 2 | Speech Recognition 2.4.15 | Neural network software Neural network software is used to simulate research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems. 2.4.16 | Types of artificial neural networks Artificial neural network types vary from those with only one or two layers of single direction logic, to complicated multi–input many directional feedback loop and layers. On the whole, these systems use algorithms in their programming to determine control and organization of their functions. Some may be as simple as a one neuron layer with an input and an output, and others can mimic complex systems such as dANN, which can mimic chromosomal DNA through sizes at cellular level, into artificial organisms and simulate reproduction, mutation and population sizes. Most systems use "weights" to change the parameters of the throughput and the varying connections to the neurons. Artificial neural networks can be autonomous and learn by input from outside "teachers" or even self-teaching from written in rules. 2.4.17 | Confidence analysis of a neural network Supervised neural networks that use an MSE cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate for variance. This value can then be used to calculate the confidence interval of the output of the network, assuming a normal distribution. A confidence analysis made this way is statistically valid as long as the output probability distribution stays the same and the network is not modified. By assigning a softmax activation function on the output layer of the neural network (or a softmax component in a component-based neural network) for categorical target variables, the outputs can be interpreted as posterior probabilities. This is very useful in classification as it gives a certainty measure on classifications. 31
  • 44. Chapter 3 | Image Processing 3.1 | INTRODUCTION This chapter is an introduction on how to handle images in Matlab. When working with images in Matlab, there are many things to keep in mind such as loading an image, using the right format, saving the data as different data types, how to display an image, conversion between different image formats, etc. This worksheet presents some of the commands designed for these operations. Most of these commands require you to have the Image processing tool box installed with MATLAB. To find out if it is installed type very at the Matlab prompt. This gives you a list of what tool boxes that are installed on your system. For further reference on image handling in Matlab you are recommended to use Matlab's help browser. There is an extensive (and quite good) on-line manual for the Image processing tool box that you can access via Matlab's help browser. The first sections of this worksheet are quite heavy. The only way to understand how the presented commands work, is to carefully work through the examples given at the end of the worksheet. Once you can get these examples to work, experiment on your own using your favorite image! 3.1.1 | What Is Digital Image Processing? Transforming digital information representing images. 3.1.2 | Motivating Problems: 1. 2. 3. 4. 5. 6. 7. 8. 9. Improve pictorial information for human interpretation. Remove noise. Correct for motion, camera position, and distortion. Enhance by changing contrast, color. Segmentation - dividing an image up into constituent parts Representation - representing an image by some more abstract. Models Classification. Reduce the size of image information for efficient handling. Compression with loss of digital information that minimizes loss of "perceptual" information. JPEG and GIF, MPEG. 33
  • 45. Chapter 3 | Image Processing 3.2 | COLOR VISION The color-responsive chemicals in the cones are called cone pigments and are very similar to the chemicals in the rods. The retinal portion of the chemical is the same, however the scotopsin is replaced with photopsins. Therefore, the colorresponsive pigments are made of retinal and photopsins. There are three kinds of color-sensitive pigments: • Red-sensitive pigment • Green-sensitive pigment • Blue-sensitive pigmentlution representations versus quality of service. Each cone cell has one of these pigments so that it is sensitive to that color. The human eye can sense almost any gradation of color when red, green and blue are mixed. The wavelengths of the three types of cones (red, green and blue) are shown. The peak absorbancy of blue-sensitive pigment is 445 nanometers, for greensensitive pigment it is 535 nanometers, and for red-sensitive pigment it is 570 nanometers. MATLAB stores most images as two-dimensional arrays (i.e., matrices), in which each element of the matrix corresponds to a single pixel in the displayed image. For example, an image composed of 200 rows and 300 columns of different colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images, such as RGB, require a three dimensional array, where the first plane in the 3rd dimension represents the red pixel intensities, the second plane represents the green pixel intensities, and the third plane represents the blue pixel intensities. To reduce memory requirements, MATLAB supports storing image data in arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit unsigned integers. These arrays require one-eighth or one-fourth as much memory as data in double arrays. An image whose data matrix has class uint8 is called an 8bit image; an image whose data matrix has class uint16 is called a 16-bit image. 3.2.1 | Fundamentals A digital image is composed of pixels which can be thought of as small dots on the screen. A digital image is an instruction of how to color each pixel. We will see in detail later on how this is done in practice. A typical size of an image is 512by-512 pixels. Later on in the course you will see that it is convenient to let the 33
  • 46. Chapter 3 | Image Processing dimensions of the image to be a power of 2. For example, 2 9=512. In the general case we say that an image is of size m-by-n if it is composed of m pixels in the vertical direction and n pixels in the horizontal direction. Let us say that we have an image on the format 512-by-1024 pixels. This means that the data for the image must contain information about 524288 pixels, which requires a lot of memory! Hence, compressing images is essential for efficient image processing. You will later on see how Fourier analysis and Wavelet analysis can help us to compress an image significantly. There are also a few "computer scientific" tricks (for example entropy coding) to reduce the amount of data required to store an image. 3.2.2 | Image Formats Supported By Mat lab. The following image formats are supported by Mat lab:       BMP HDF JPEG PCX TIFF XWB Most images you find on the Internet are JPEG-images which is the name for one of the most widely used compression standards for images. If you have stored an image you can usually see from the suffix what format it is stored in. For example, an image named myimage.jpg is stored in the JPEG format and we will see later on that we can load an image of this format into Mat lab. 3.2.3 | Working Formats In Matlab: If an image is stored as a JPEG-image on your disc we first read it into Matlab. However, in order to start working with an image, for example perform a wavelet transform on the image, we must convert it into a different format. This section explains four common formats. 3.3 | ASPECTS OF IMAGE PROCESSING 33
  • 47. Chapter 3 | Image Processing Image Enhancement: Processing an image so that the result is more suitable for a particular application. (Sharpening or deploring an out of focus image, highlighting edges, improving image contrast, or brightening an image, removing noise) Image Restoration: This may be considered as reversing the damage done to an image by a known cause. (Removing of blur caused by linear motion, removal of optical distortions) Image Segmentation: This involves subdividing an image into constituent parts, or isolating certain aspects of an image.(finding lines, circles, or particular shapes in an image, in an aerial photograph, identifying cars, trees, buildings, or roads. 3.4 | IMAGE TYPES 3.4.1 | Intensity Image (Gray Scale Image) This is the equivalent to a "gray scale image" and this is the image we will mostly work with in this course. It represents an image as a matrix where every element has a value corresponding to how bright/dark the pixel at the corresponding position should be colored. There are two ways to represent the number that represents the brightness of the pixel: The double class (or data type). This assigns a floating number ("a number with decimals") between 0 and 1 to each pixel. The value 0 corresponds to black and the value 1 corresponds to white. The other class is called uint8 which assigns an integer between 0 and 255 to represent the brightness of a pixel. The value 0 corresponds to black and 255 to white. The class uint8 only requires roughly 1/8 of the storage compared to the class double. On the other hand, many mathematical functions can only be applied to the double class. We will see later how to convert between double and uint8. Fig. (3.1) 33
  • 48. Chapter 3 | Image Processing 3.4.2 | Binary Image: This image format also stores an image as a matrix but can only color a pixel black or white (and nothing in between). It assigns a 0 for black and a 1 for white. 3.4.3 | Indexed Image: This is a practical way of representing color images. (In this course we will mostly work with gray scale images but once you have learned how to work with a gray scale image you will also know the principle how to work with color images.) An Indexed image stores an image as two matrices. The first matrix has the same size as the image and one number for each pixel. The second matrix is called the color map and its size may be different from the image. The numbers in the first matrix is an instruction of what number to use in the color map matrix. Fig. (3.2) 3.4.4 | RGB Image This is another format for color images. It represents an image with three matrices of sizes matching the image format. Each matrix corresponds to one of the colors red, green or blue and gives an instruction of how much of each of these colors a certain pixel should use. 3.4.5 | Multi-frame Image: In some applications we want to study a sequence of images. This is very common in biological and medical imaging where you might study a sequence of slices of a cell. For these cases, the multi-frame format is a convenient way of 33
  • 49. Chapter 3 | Image Processing working with a sequence of images. In case you choose to work with biological imaging later on in this course, you may use this format. 3.5 | HOW TO? 3.5.1 | How To Convert Between Different Formats: The following table shows how to convert between the different formats given above. All these commands require the Image processing tool box! Table(3.1)Image format conversion (Within the parenthesis you type the name of the image you wish to convert) Operation Convert between intensity/indexed/RGB format to binary format. Convert between intensity format to indexed format. Convert between indexed format to intensity format. Convert between indexed format to RGB format. Convert a regular matrix to intensity format by scaling. Convert between RGB format to intensity format. Convert between RGB format to indexed format. Matlab command dither() gray2ind() ind2gray() ind2rgb() mat2gray() rgb2gray() rgb2ind() The command mat2gray is useful if you have a matrix representing an image but the values representing the gray scale range between, let's say, 0 and 1000. The command mat2gray automatically re scales all entries so that they fall within 0 and 255 (if you use the uint class) or 0 and 1 (if you use the double class). 3.5.2 | How to Read Files When you encounter an image you want to work with, it is usually in form of a file (for example, if you down load an image from the web, it is usually stored as a JPEG-file). Once we are done processing an image, we may want to write it back to a JPEG-file so that we can, for example, post the processed image on the web. This is done using the imread and imwrite commands. These commands require the Image processing tool box! 33
  • 50. Chapter 3 | Image Processing Table(3.2)Reading and writing image files Operation Read an image. (Within the parenthesis you type the name of the image file you wish to read. Put the file name within single quotes Write an image to a file. (As the first argument within the parenthesis you type the name of the image you have worked with. As a second argument within the parenthesis you type the name of the file and format that you want to write the image to. Put the file name within single quotes. Matlab command imread() imwrite( ) Make sure to use semi-colon; after these commands, otherwise you will get LOTS OF number scrolling on your screen... The commands imread and imwrite support the formats given in the section "Image formats supported by Matlab" above. 3.5.3 | Loading And Saving Variables in Matlab This section explains how to load and save variables in Mat lab. Once you have read a file, you probably convert it into an intensity image (a matrix) and work with this matrix. Once you are done you may want to save the matrix representing the image in order to continue to work with this matrix at another time. This is easily done using the commands save and load. Note that save and load are commonly used Matlab commands, and works independently of what tool boxes that are installed. Table(3.3) Loading and saving variables Operation Save the variable X. Load the variable X. Matlab command Save X Load X 3.5.4 | How to Display an Image in MATLAB Here are a couple of basic Mat lab commands (do not require any tool box) for displaying an image. 33
  • 51. Chapter 3 | Image Processing Table(3.4)Displaying an image given on matrix form Operation Display an image represented as the matrix X. Adjust the brightness .S is a parameter such that -1<s<0 gives a darker image, 0<s<1 gives a brighter image. Change the colors to gray. Matlab command imagesc(X) brighten(s) colormap(gray) Sometimes your image may not be displayed in gray scale even though you might have converted it into a gray scale image. You can then use the command colormap (gray) to "force" Matlab to use a gray scale when displaying an image. If you are using Matlab with an Image processing tool box installed, I recommend you to use the command imshow to display an image. Table (3.5)Displaying an image given on matrix form (with image processing tool box) Operation Matlab command Display an image represented as the matrix X. imshow(X) Zoom in (using the left and right mouse button). zoom on Turn off the zoom function. zoom off 3.6 | SOME IMPORTANT DEFINITIONS 3.6.1 | Imread Function A = imread (filename, fmt) reads a grayscale or true color image named filename into A. If the file contains a grayscale intensity image, A is a two-dimensional array. If the file contains a true color (RGB) image, A is a three-dimensional (mby-n-by-3) array. 3.6.2 | Rotation >> B = imrotate (A, ANGLE, METHOD) Where; A: Your image. ANGLE: The angle (in degrees) you want to rotate your image in the counter clockwise direction. METHOD: A string that can have one of these values If you omit the METHOD argument, IMROTATE uses the default method of 'nearest'. 34
  • 52. Chapter 3 | Image Processing Note: to rotate the image clockwise, specify a negative angle. The returned image matrix B is, in general, larger than A to include the whole rotated image. IMROTATE sets invalid values on the periphery of B to 0. 3.6.3 | Scaling IMRESIZE resizes an image of any type using the specified interpolation method. Supported interpolation methods 3.6.4 | Interpolation 'nearest' (default) nearest neighbor interpolation? 'bilinear' bilinear interpolation? 'bicubic' bicubic interpolation ? B = IMRESIZE(A,M,METHOD) returns an image that is M times the size of A. If M is between 0 and 1.0, B is smaller than A. If M is greater than 1.0, B is larger than A. If METHOD is omitted, IMRESIZE uses nearest neighbor interpolation. B = IMRESIZE (A,[MROWS MCOLS],METHOD) returns an image of size MROWS-by-MCOLS. If the specified size does not produce the same aspect ratio as the input image has, the output image is distorted. a= imread(‘image.fmt’); % put your image in place of image.fmt. » B = IMRESIZE (a,[100 100],'nearest'); » imshow(B); » B = IMRESIZE(a,[100 100],'bilinear'); » imshow(B); » B = IMRESIZE(a,[100 100],'bicubic'); » imshow(B); 3.7 | EDGE DETECTION 3.7.1 | Canny Edge Detector 1. Low error rate of detection Well match human perception results 2. Good localization of edges The distance between actual edges in an image and the edges found by a computational algorithm should be minimized 3. Single response 34
  • 53. Chapter 3 | Image Processing The algorithm should not return multiple edges pixels when only a single one exist. 3.7.2 | Edge Detectors bw color Canny sobel Fig.(3.4) Fig. (3.5) 3.7.3 | Edge Tracing b=rgb2gray(a); % convert to gray. WE can only do edge tracing for gray images. edge(b,'prewitt'); edge(b,'sobel'); edge(b,'sobel','vertical'); edge(b,'sobel','horizontal'); edge(b,'sobel','both'); We can only do edge tracing using gray scale images (i.e images without color). 34
  • 54. Chapter 3 | Image Processing >> BW=rgb2gray (A); >> edge (BW,’prewitt’) Fig.(3.6) That is what I saw! >> edge (BW,’sobel’,’vertical’) >> edge (BW,’sobel’,’horizontal’) >> edge (BW,’sobel’,’both’) Table(3.6):Data types Type Int8 Uint8. Int16 Double Description 8-bit integer 8-bit unsigned integer 16-bit integer Double precision real number 3.8 | MAPPING 3.8.1 | Mapping Images onto Surfaces Overview 33 Range -128_127 0_255 -32768_32767 Machine specific
  • 55. Chapter 3 | Image Processing Mapping an image onto geometry, also known as texture mapping, involves overlaying an image or function onto a geometric surface. Images may be realistic, such as satellite images, or representational, such as color-coded functions of temperature or elevation. Unlike volume visualizations, which render each voxel (volume element) of a three-dimensional scene, mapping an image onto geometry efficiently creates the appearance of complexity by simply layering an image onto a surface. The resulting realism of the display also provides information that is not as readily apparent as with a simple display of either the image or the geometric surface. Mapping an image onto a geometric surface is a two step process. First, the image is mapped onto the geometric surface in object space. Second, the surface undergoes view transformations (relating to the viewpoint of the observer) and is then displayed in 2D screen space. You can use IDL Direct Graphics or Object Graphics to display images mapped onto geometric surfaces. The following table introduces the tasks and routines. Table(3.7):Tasks and Routines Associated with Mapping an Image onto Geometry Routine(s)/Object(s) Description SHADE_SURF Display the elevation data IDLgrWindow::Init IDLgrView::Init Initialize the objects necessary for an Object Graphics display. IDLgrModel::Init IDLgrSurface:: Init Initialize a surface object containing the elevation data. IDLgrImage::Init Initialize an image object containing the satellite image XOBJVIEW Display the object in an interactive IDL utility allowing rotation and resizing. 3.8.2 | Mapping an Image onto Elevation Data The following Object Graphics example maps a satellite image from the Los Angeles, California vicinity onto a DEM (Digital Elevation Model) containing the areas topographical features. The realism resulting from mapping the image onto the corresponding elevation data provides a more informative view of the area’s topography. The process is segmented into the following three sections: • “Opening Image and Geometry Files” • “Initializing the IDL Display Objects” • “Displaying the Image and Geometric Surface Objects” 33
  • 56. Chapter 3 | Image Processing Note: Data can be either regularly gridded (defined by a 2D array) or irregularly gridded (defined by irregular x, y, z points). Both the image and elevation data used in this example are regularly gridded. If you are dealing with irregularly gridded data, use GRIDDATA to map the data to a regular grid. Complete the following steps for a detailed description of the process. Example Code: See elevation_object.pro in the examples/doc/image subdirectory of the IDL installation directory for code that duplicates this example. Run the example procedure by entering elevation object at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT elevation_object.pro. Opening Image and Geometry Files: The following steps read in the satellite image and DEM files and display the Elevation data. 1. Select the satellite image: >> imageFile = FILEPATH('elev_t.jpg', $) SUBDIRECTORY = ['examples', 'data']) 2. Import the JPEG file: READ_JPEG, image File, image 3. Select the DEM file: demFile = FILEPATH('elevbin.dat', $) SUBDIRECTORY = ['examples', 'data']) 4. Define an array for the elevation data, open the file, read in the data and close the file: dem = READ_BINARY(demfile, DATA_DIMS = [64, 64] 5. Enlarge the size of the elevation array for display purposes: dem = CONGRID(dem, 128, 128, /INTERP) 6. To quickly visualize the elevation data before continuing on to the Object Graphics section, initialize the display, create a window and display the elevation data using the SHADE_SURF command: DEVICE, DECOMPOSED = 0 33
  • 57. Chapter 3 | Image Processing WINDOW, 0, TITLE = 'Elevation Data' SHADE_SURF, dem After reading in the satellite image and DEM data, continue with the next section to create the objects necessary to map the satellite image onto the elevation surface. Fig.(3.7):Visual Display of the Elevation Data After reading in the satellite image and DEM data, continue with the next section to create the objects necessary to map the satellite image onto the elevation surface. 3.8.3 | Initializing the IDL Display Objects After reading in the image and surface data in the previous steps, you will need to create objects containing the data. When creating an IDL Object Graphics display, it is necessary to create a window object (oWindow), a view object (oView) and a model object (oModel). These display objects, shown in the conceptual representation in the following figure, will contain a geometric surface object (the DEM data) and an image object (the satellite image). These user-defined objects are instances of existing IDL object classes and provide access to the properties and methods associated with each object class. 33
  • 58. Chapter 3 | Image Processing Note: (The XOBJVIEW utility (described in “Mapping an Image Object onto a Sphere” automatically creates window and view Complete the following steps to initialize the necessary IDL objects.) 1. Initialize the window, view and model display objects. For detailed syntax, arguments and keywords available with each object initialization, see IDLgrWindow::Init, IDLgrView::Init and IDLgrModel::Init. The following three lines use the basic syntax : oNewObject = OBJ_NEW('Class_Name') To create these objects: oWindow = OBJ_NEW('IDLgrWindow', RETAIN = 2, COLOR_MODEL = 0) oView = OBJ_NEW('IDLgrView') oModel = OBJ_NEW('IDLgrModel') 2. Assign the elevation surface data, dem, to an IDLgrSurface object. The IDLgrSurface::Init keyword, STYLE = 2, draws the elevation data using a filled line style: oSurface = OBJ_NEW('IDLgrSurface', dem, STYLE = 2) 3. Assign the satellite image to a user-defined IDLgrImage object using IDLgrImage::Init: oImage = OBJ_NEW('IDLgrImage', image, INTERLEAVE = 0, $ /INTERPOLATE) INTERLEAVE = 0 indicates that the satellite image is organized using pixel interleaving, and therefore has the dimensions (3, m, n). The INTERPOLATE keyword forces bilinear interpolation instead of using the default nearest neighbor interpolation method. 3.8.4 | Displaying the Image and Geometric Surface Objects This section displays the objects created in the previous steps. The image and surface objects will first be displayed in an IDL Object Graphics window and then with the interactive XOBJVIEW utility. 33
  • 59. Chapter 3 | Image Processing 1. Center the elevation surface object in the display window. The default object graphics coordinate system is [–1,–1], [1,1]. To center the object in the window, position the lower left corner of the surface data at [–0.5,–0.5, –0.5] for the x, y and z dimensions: 2. Map the satellite image onto the geometric elevation surface using the IDLgrSurface::Init TEXTURE_MAP keyword: oSurface -> SetProperty, TEXTURE_MAP = oImage, $ COLOR = [255, 255, 255] For clearest display of the texture map, set COLOR = [255, 255, 255]. If the image does not have dimensions that are exact powers of 2, IDL resamples the image into a larger size that has dimensions which are the next powers of two greater than the original dimensions. This resampling may cause unwanted sampling artifacts. In this example, the image does have dimensions that are exact powers of two, so no resampling occurs. oSurface -> GETPROPERTY, XRANGE = xr, YRANGE = yr, $ ZRANGE = zr xs = NORM_COORD(xr) xs[0] = xs[0] - 0.5 ys = NORM_COORD(yr) ys[0] = ys[0] - 0.5 zs = NORM_COORD(zr) zs[0] = zs[0] - 0.5 oSurface -> SETPROPERTY, XCOORD_CONV = xs, $ YCOORD_CONV = ys, ZCOORD = zs Note: (If your texture does not have dimensions that are exact powers of 2 and you do not want to introduce resampling artifacts, you can pad the texture with unused data to a power of two and tell IDL to map only a subset of the texture onto the surface.) For example, if your image is 40 by 40, create a 64 by 64 image and fill part of it with the image data: textureImage = BYTARR(64, 64, /NOZERO) textureImage[0:39, 0:39] = image ; image is 40 by 40 oImage = OBJ_NEW('IDLgrImage', textureImage) Then, construct texture coordinates that map the active part of the texture to a surface (oSurface): textureCoords = [[], [], [], []] 33
  • 60. Chapter 3 | Image Processing oSurface -> SetProperty, TEXTURE_COORD = textureCoords The surface object in IDL 5.6 is has been enhanced to automatically perform the above calculation. In the above example, just use the image data (the 40 by 40 array) to create the image texture and do not supply texture coordinates. IDL computes the appropriate texture coordinates to correctly use the 40 by 40 image. Note: (Some graphic devices have a limit for the maximum texture size. If your texture is larger than the maximum size, IDL scales it down into dimensions that work on the device. This rescaling may introduce resampling artifacts and loss of detail in the texture. To avoid this, use the TEXTURE_HIGHRES keyword to tell IDL to draw the surface in smaller pieces that can be texture mapped without loss of detail.) 3. Add the surface object, covered by the satellite image, to the model object. Then add the model to the view object: oModel -> Add, oSurface. oView -> Add, oMode. 4. Rotate the model for better display in the object window. Without rotating the model, the surface is displayed at a 90 elevation angle, containing no depth information. The following lines rotate the model 90 away from the viewer along the x-axis and 30clockwise along the y-axis and the x-axis: oModel -> ROTATE, [1, 0, 0], -90 oModel -> ROTATE, [0, 1, 0], 30 oModel -> ROTATE, [1, 0, 0], 30 5. Display the result in the Object Graphics window: oWindow -> Draw, oView Fig.(3.9:Image Mapped onto a Surface in an Object Graphics Window 33
  • 61. Chapter 3 | Image Processing 6. Display the results using XOBJVIEW, setting the SCALE = 1 (instead of the default value of 1/SQRT3) to increase the size of the initial display: XOBJVIEW, oModel, /BLOCK, SCALE = 1 This results in the following display: Fig.( 3.10) Displaying the Image Mapped onto the Surface in XOBJVIEW After displaying the model, you can rotate it by clicking in the applicationwindow and dragging your mouse. Select the magnify button, then click near the middle of the image. Drag your mouse away from the center of the display to magnify the image or toward the center of the display to shrink the image. Select the left-most button on the XOBJVIEW toolbar to reset the display. 7. Destroy unneeded object references after closing the display windows: OBJ_DESTROY, [oView, oImage] The oModel and oSurface objects are automatically destroyed when oView is destroyed. For an example of mapping an image onto a regular surface using both Direct and Object Graphics displays, see “Mapping an Image onto a Sphere” 34
  • 62. Chapter 3 | Image Processing 3.8.5 | Mapping an Image onto a Sphere The following example maps an image containing a color representation of world elevation onto a sphere using both Direct and Object Graphics displays. The example is broken down into two sections: • “Mapping an Image onto a Sphere Using Direct Graphics” . • “Mapping an Image Object onto a Sphere” . 3.9 | MAPPING OFF LINE: In the absence of a network or services we can identify and see the track through the use of image processing technique, We incorporate the map where an image of the places familiar to the person and determine how to access them and return them in a clear and safe. we calculate the distances by using mat lab function : IMDISTLINE and assuming speed to calculate time takes to get from one point to another and we guide person through voice commands for example on the road to move forward or back word or to left or to right. We have thus, we provide another way to work mapping without being online. 34