1. 1
Detection of Face using Viola Jones and
Recognition using Backpropagation
Neural Network
A dissertation submitted in fulfillment
of the requirements for the award degree of
Masters of Technology
(Electronics & Communication Engineering)
in
THE NORTHCAP UNIVERSITY,
Gurgaon
by
Smriti Tikoo
(14-ECP-015)
under the guidance of
Nitin Malik
Department of Electrical, Electronics and Communication Engineering
SCHOOL OF ENGINEERING AND TECHNOLOGY
Gurgaon, Haryana
May-2016
2. 2
INDEX
CONTENTS Page No.
Candidate’s Declaration i
Acknowledgement ii
Abstract iii
List of abbreviations 1
List of Tables 2
List of Figures 3
CHAPTER 1: INTRODUCTION
1.1 PROBLEM DEFINITION 11
1.2 FACE DETECTION 13
1.3 FACE DETECTION TECHNIQUES 15
1.4 FACE RECOGNITION 18
1.5 FEATURE DETECTION 20
1.6 NEURAL NETWORKS 21
1.7 BACKPROPAGATION 24
1.8 COMPARISON WITH OTHER TECHNOLOGIES 24
1.9 PROBLEM STATEMENT 25
CHAPTER 2: LITERATURE REVIEW 26
CHAPTER 3: PROPOSED METHODOLOGY 30
3.1 VIOLA JONES ALGORITHM 31
3.2 HAAR FEATURES 33
3.3 INTEGRAL IMAGE 35
3.4 ADA BOOST TRAINING ALGORITHM 37
3.5 CASCADING AMPLIFIERS 38
3. 3
CHAPTER 4: SOFTWAREIMPLEMENTATION
4.1 MATALAB 8.2 (R2013B) 41
4.2 OPEN SOURCE COMPUTER VISION 41
4.3 COMPUTER VISON TOOLS 43
4.4 VISION CASCADE OBJECT DETECTOR 44
4.5 NEURAL NETWORK TOOLS 47
4.6 NEURAL NETWORK FITTING TOOL 47
4.7 TRAINING ALGORITM 48
4.8 PROGRAMMING CODE FOR FACE DETECTION 48
4.9 PROCEDURE OF FACE RECOGNITION 53
CHAPTER 5 : RESULTS AND DISCUSSION
5.1 RESULTS OF RECOGNITIONPROCESS 56
CHAPTER 6: CONCLUSION AND FUTURE SCOPE
6.1 CONCLUSIONS 60
6.2 FUTURE SCOPE 61
References 62
Publications 65
Plagiarism Report
Date…………….
4. 4
CANDIDATE’S DECLARATION
I hereby declare that the work presented in this dissertation report “Detection of face using
Viola Jones algorithm and recognition using Backpropagation Neural Network” in
complete fulfillment of the requirements for the award of degree of Master of Technology in
Electronics and Communications Engineering in THE NORTHCAP UNIVERSITY,
Gurgaon is an authentic record of my own work carried out under the guidance of Dr. Nitin
Malik (Associate Professor).
The matter embodied in this dissertation report has been submitted by me for the award of
degree of Master of Technology.
Date:
Place: (Smriti Tikoo)
5. 5
Acknowledgement
This research was mentored by Dr.Nitin Malik (Associate Professor) at The NorthCap
University Gurgaon. I am grateful to him for his undivided support and professional expertise
on the area of research. I would like to express my deepest gratitude towards him for
moderating my research throughout which made a significant difference in the course of
dissertation. I would like to express my appreciation for voicing their views on the earlier
versions of my work..
6. 6
Abstract
This thesis is an attempt towards facial recognition which is a prominent topic among the
communities such as image processing neural network and computer vision. Face detection
and recognition has been prevalent with research scholars and diverse approaches have been
incorporated till date to serve purpose. The rampant advent of biometric analysis systems,
which may be full body scanners, or iris detection and recognition systems and the finger print
recognition systems, and surveillance systems deployed for safety and security purposes have
contributed to inclination towards same. Advances has been made with frontal view, lateral
view of the face or using facial expressions such as anger, happiness and gloominess, still
images and video image to be used for detection and recognition. This led to newer methods
for face detection and recognition to be introduced in achieving accurate results and
economically feasible and extremely secure. Techniques such as Principal Component analysis
(PCA), Independent component analysis (ICA), Linear Discriminant Analysis (LDA), have
been the predominant ones to be used .But with improvements needed in the previous
approaches Neural Networks based recognition was like boon to the industry. It not only
enhanced the recognition but also the efficiency of the process. Choosing Backpropagation as
the learning method was clearly out of its efficiency to recognize non linear faces with an
acceptance ratio of more than 90% and execution time of only few seconds.
7. 7
LIST OF ABBREVIATIONS
ANN Artificial Neural Network
BPNN Backpropagation Neural Network
DCT Direct Cosine Transform
FFT Fast Fourier Transform
ICA Independent Component Analysis
LDA Linear Discriminant Analysis
MFC Multi Resolution Feature Concatenation
MLP Multi Layer Perceptron
MLNN Multi Layer Neural Network
MMV Multi Resolution Majority Voting
PCA Principal Component Analysis
8. 8
LIST OF TABLES
Page No:
Table1 Sample data for sample images 1, 2, and 3. 56
Table 2 Sample data of sample image 2. 58
9. 9
LIST OF FIGURES
Page No:
Figure 1 Thumb impression as biometric scan . 12
Figure 2 Finger print being computed into binary data. 12
Figure 3 Generalized block diagram of face recognition. 13
Figure 4 A group of faces detected 14
Figure 5 Faces detected in a press conference. 15
Figure 6 Diagrammatical representation of viola Jones algorithm. 16
Figure 7 Simplified neural network. 23
Figure 8 Graphical representation of sigmoidal function. 23
Figure 9 Flowchart viola Jones algorithm 31
Figure 10 Rectangular features used to in Viola Jones 32
Figure 11 Flow chart for face recognition using BPNN. 35
Figure 12 Finding sum of shaded rectangular area 35
Figure 13 Cascading Amplifier a simplified diagram 39
Figure 14 Training cascade detector. 46
Figure 15 Faces detected by Viola Jones algorithm are highlighted by bounding boxes. 50
Figure 16 Nose detected. 51
Figure 17 Mouth detected. 52
Figure 18 Eyes detected. 52
Figure 19 Sample images 1, 2, 3 in rgb, gray scale and binary format. 53
Figure 20 Segmented image. 54
10. 10
Figure 21 Binary image histogram equivalents. 54
Figure 22 Sample image1 in rgb, sample image 2 in gray and histogram equivalent of 55
sample image 2.
Figure 23 A three stage training procedure. 57
Figure 24 Parameters calculated during training procedure. 57
Figure 25 Performance plot for sample data 1. 58
Figure 26 Performance plot of sample image 2. 59
11. 11
CHAPTER 1. Introduction
1.1 Problem Definition
Automatic recognition dates back to the years of 1960’s when pioneers such as Woody
Bledsoe, Helen Chan Wolf, and Charles Bisson introduced their works to the world. In 1964
& 1965, Bledsoe, along with Helen Chan and Charles Bisson had worked on the computer to
recognize human faces.
But it didn’t allow much recognition to his work due to not much support from agencies. After
him the work was carried forward by Peter Hart at the Stanford Research Institute. The results
given by computer were terrific when feeded with a gallery of images for recognition.
In around 1997, another approach by Christoph von der Malsburg and students of lot of reputed
universities got together to design a system which was even better than the previous one. The
system designed was robust and could identify people from photographs which didn’t offer a
clear view of the facial images .
After that technologies like Polar and FaceIt were introduced which were doing the work in
lesser time and gave accurate results by offering features like a 3D view of the picture and
comparing it to a database from worldwide. Till the year 2006 a lot of new algorithm replaced
the previous ones and were proving to be better in terms of offering various new techniques to
get better results and their performance was commendable.
Automatic Recognition has seen a lot of improvement since 1993, the error rate has dropped
down & enhancement methods to improve the resolution have hit the market. Ever since then
biometric authentication is all over the place, these days ranging from home security systems,
surveillance cameras, to fingerprint and iris authentication systems, user verification and etc.
12. 12
These systems have contributed in catching terrorists and other type of intruders via cyber or
in person. Recognition of faces has been under a research since 1960’s but it’s just over a
decade that acceptable results have been obtained.
Figure 1: Thumb impression as biometric scan
Figure 2: Finger print being computed into binary data
Researchers are still trying to attain better models and approaches to be able to recognize just
like our brain does with utmost triviality be whatever the lighting conditions or time lapse of
years between generations, skin discolorations’ or even after intricate changes in faces like
undergoing botox or surgical treatments.
Approaches deployed do not count the intricate details which differ from person to person in
context of recognition results. The continuous changes in expression of face depending upon
the situation being experienced by person disclose important information regarding the
structural characteristics and the texture, thereby helping in recognition process.
Of the many methods adopted towards recognition of face they can be broadly classified as a)
holistic approach which relies of taking the whole face as an input in pixels and b) feature
13. 13
extraction which concentrates on capturing the features in a face but certain methods are
deployed for removing the redundancy information and dimension problems involved.
Three basic approaches in this direction are namely PCA, LDA, ICA others which were added
to the list include the neural network, fuzzy and skin and texture based approaches also
including the ASM model. The problem is approached first by detecting the face and its features
and then moving on to recognition. A lot approaches are available for detection of faces like
Viola Jones algorithm, LBP, Adaboost algorithm, neural network based detection or usage of
Snow classifier method .
Figure 3: Generalised block diagram of face recognition
1.2 Face Detection
One of the foremost and important step in the procedure of facial recognition is the detection
of a face. It forms the first step in the direction of analysis of entire face, alignment, relighting,
modeling, recognition, head pose tracking, authentication of face and tracking of face and its
expressions.
Recognition by computers is necessary for them to be able to interpret peoples’ thoughts. The
aim behind facial detection is to detect a face in the image if any and return the image location
and extent of each face. For this appropriate face modeling and segmentation is to be done.
14. 14
While carrying out face detection sources of variation of facial appearance like viewing
geometry (pose), illumination, the imaging process (resolution, focus, imaging noise,
perspective effects), and other factors like Occlusion should be considered .
Approaches of detection which rely on image formation help in detecting the color, shape or
motion as well. It’s a specific case of object-class detection. In object-class detection, the task
is to find the locations and sizes of all objects in an image that belongs to a given class.
Generally the detection algorithms for face Face-detection algorithms concentrate on frontal
facial images.
The image is then emulated with images in database. It is often used for biometric
authentication, video surveillance, and human computer interface and image database
management. It has gained itself a great deal of interest among the marketers as well.
Figure 4 : A group of faces detected
15. 15
Figure 5: Faces of people detected during a press conference
1.3 Face DetectionTechniques
There a lot of techniques available for the purpose of detecting face in an image. The following
are listed below:
Viola Jones Face DetectionAlgorithm: Proposed by Paul Viola and Michael Jones, it proves
to be a one of its kind by providing competitive results for real time situations .It not only aces
the process of detecting frontal faces but can also be modeled for detecting variety of other
object classes. It’s robust nature and its adaptability for practical situation makes its popular
among its users. It follows a four stage procedure for the entire face detection which is : first is
Haar feature selection, then comes the integral image creation followed by adaboost training
and cascading of amplifiers.
Haar Feature selection matches the commonalities found in human faces. The integral image
calculates the rectangular features in fixed time which benefits it over other sophisticated
features. Integral image at (x,y) coordinates gives the pixel sum of the coordinates above and
on to the left of the (x,y).
16. 16
Advantages:
1. Popular for accurate face detection.
2. High Detection rates and fit to be used for real time.
3. High Accuracy.
4. Reduced computation due to usage of cascading amplifiers.
Limitations:
1. Prolonged training time
2. Inadequate head positions.
3. Black faces not detected.
Figure 6: Diagrammatical representation of Viola Jones algorithm
Local Binary Pattern (LBP): An efficient technique for explaining the texture features of an
image .Its high computation speed and rotation invariance which assists its usage in areas of
retrieval of image, texture, recognition, segmentation. One of its successful applications
includes the objects detected in a video through elimination of the backdrop.
17. 17
It assigns a unique texture value to each pixel to facilitate its conciliation with objective at hand
and also for easing the process of tracking the video.LBP majorly is beneficial for recognition
of the main points in target and formation of mask for joint colour texture selection.
Advantages:
1. Apt for expressing texture features.
2. Found effective for analysis of texture, retrieval of images, recognition and
segmentation of faces.
3. Spots a moving object with ease via elimination the backdrop
4. It is a Simple Approach and its computationally simple than Haar Like feature and fast.
5. The most vital properties of LBP features are tolerance against the monotonic
illumination changes and computational simplicity.
Limitations:
1. Insensitive to minute changes in localization process.
2. Error rate rises on usage of huge local regions.
3. Proves to be ineffective for any changes in non monotonic images.
4. Usage restricted for binary and grey scale.
Ada Boost Algorithm for Face Detection: Its basic aim is to create a superior prediction rule
which is highly accurate and is based on combining previously comparative weak incorrect
rules. It is the first and foremost practical boosting algorithm to have garnered wide acceptance
in various fields .It contributed to high detection rates and there by processing of images was
done on a faster rate.
From selection of visual features to linear combination of classifiers Ada boost did it all to
make an strong classifier. Over fitting seems to be one of the few restrictions other than noisy
data and outliers to the boosting algorithm.
Its adaptive name is due to its tendency to generate a strong classifier by carrying out multiple
iterations. A strong classifier is formed by cascading the weak amplifiers which slightly related
to the actual classifier.
18. 18
A round about process of adding weak classifiers to the chain of formation of a strong classifier
goes about with an extra addition of a weight vector to concentrate on misclassified classifiers.
The outcome is a classifier that has higher accuracy than the weak learner’s classifiers.
Advantages:
1. Usage extends to various classifiers
2. Enhance accuracy levels of classifier.
3. Easy to implement and commonly used.
4. Over fitting, noisy data and outliers pose a restriction to algorithm.
Limitations:
1. Prior knowledge about the face structure not required. Just a pair of inputs to be fed, a
training database, and classification functions is needed.
2. Adaptive algorithm. The positive and negative aspects tested by classifier at every stage to
avoid misclassification
3. An error generated during training exponentially tends to approach zero in a finite number
of iterations.
4. Easy to program and its feature selection helps in giving a classifier is fast and simple.
5. Quality of detection relies on consistence of training set , its size and interclass variability.
6. Training is time consuming, as computation time is proportional to the size of features and
example sets.
1.4 Face recognition system: Modern era has brought about a revolution by bringer the
world closer to the computerized applications .With scenarios such breach of security, terrorist
attacks, cyber crimes and fraudery becoming rampant via various modes requirement of
authentication systems and verification technologies has become the need of the hour.
Although various biometric traits are available for carrying out the process but we have focused
on face recognition as it is flexible and reliable irrespective of a person being aware that his/her
face is being scanned. In this procedure user identification plays a vital role in order to
authenticate a face from the available database.
While designing feasibility and effective recognition should be kept in mind and a simple and
robust system is should be sought after solution at hand .This makes its wide acceptance
19. 19
possible and makes socially more popular as well. Personal verification processes also utilizing
biometrics to make the procedures more accurate and efficient.
The major factors influencing the recognition method are the illumination and the various pose
variations at hand .Post the face detection feature extraction is the next step in command to
reach face recognition. A lot of approaches are available for extracting features. The
recognition process encompasses the usage of an automated or semi or manual way to match
the facial images.
Most common by far is 2Dimensional face recognition which is easier and less expensive
compared to the other approaches. The recognition system has the flexibility to operate in
either of the two mode s i.e. verification or identification. But the course of action is not as
easy as the brain makes it feel and work appropriately only under certain set conditions.
Degradation in performance is often observed when the images captured lack the appropriate
lighting, or changing poses make pose a restriction apart from the facial expressions. Some
algorithm for recognition use landmarks or features from the image in which the face is to
be detected.
Others resort normalization and compression of facial database. Popular recognition
algorithms include Principal Component Analysis using eigen faces, Linear Discriminate
Analysis, Elastic Bunch Graph Matching using the Fisher face algorithm, the Hidden Markov
model, the Multilinear Subspace Learning using tensor representation, and the neuronal
motivated dynamic link matching.
A new emerging trend is 3dimensional recognition which has argued for providing enhanced
accuracy utilizing sensors to capture the facial image of a person and information regarding
its shape and other properties such contour of eye sockets etc.
20. 20
1.5 Feature Detection:
It refers to the ways which target the computing abstractions in the formation and decisions at
each image point where there might or might not be a feature to capture. Results of this modus
operandi are subsets of the original image domain in the form either isolated points, curves etc.
By definition feature can be understood as an unique part of image depending on the situation.
They might be at times the initial point of the beginning of an algorithm and therefore the
subsequent or the overall algorithm will be effective enough as much as the feature detector.
One of the prominent aspects is recurring features which might be detected in different images
of same type as well. It’s a nominal operation in image processing and gets operated as the next
operation after detection of face. It involves scanning each pixel to look for features. For a
lengthy algorithm scanning is done for the image in the region of features only.
Types of Features: Apart from the usually visible features in a face i.e eyes, nose, mouth, lips
, forehead there are various other features which are also helpful in verifying someone’s
identity.
Edges: forms the points where there is a boundary separating or joining two image regions.
Technically speaking points having high gradient magnitude classify as edges. Usually these
are one dimensional.
Corners/ interest points : Points like features anywhere in a facial image classify as corners.
They are two dimensional unlike the edges .The discovery of the corner points was because of
the facts that edges analyzed in the earlier works showed rampant changes in directions.
The algorithms were developed to avoid the detailed edge detection. It is detected as high levels
of curvature in the gradient. The next hurdle at hand was detection of corners on parts of images
which were not true corner points.
Blobs /region of interest or interest points: Provided the corresponding image description in
terms of region. The algorithms employed for detecting blob detected areas which were too
leveled to be detected as corner point. Blob descriptors contained points like center of gravity
etc distinguishing between a corner and blog becomes difficult if an image is reduced from its
actual size
21. 21
Ridges –especially for elongated objects in context of a grey level image it is generalization of medial
axis in a realistic situation it is treated as one dimensional curve depicting axis of symmetry, but they
are harder to extract from the images.
Feature Extraction: On completion of the feature detection process, nest is the feature
detection which is capturing a section around the detected feature. Extraction involves
considerable amounts of processing of images and the outcome is called a feature descriptor or
a feature vector.
Approaches available for it are N-jets and the local histograms. In addition to such attribute
information, the feature detection step by itself may also provide complementary attributes
such as the edge orientation and gradient magnitude in edge detection and polarity and the
strength of the blob in blob detection. It is hybrid method of data reduction .Feature extraction
techniques extract a set of salient or discriminatory features as holistic or local features from
the face image known as feature space .Image will be rejected as a high dimensional feature
space.
1.6 Neural Networks:
Defines a family of algorithms via a multiple layers of interconnected inspired from the neuron
structure in the brain. Here each circular node represents an artificial neuron and an arrow
represents the connection from the output of one neuron to the input of another.
These networks are designed for approximating functions for huge amount of inputs generally
unknown. The nodes in neural network exchange data between one another. The connections
between the nodes have numerics weights associated which are altered to attain an output
matching the target output in ace of supervised learning or making neural nets adaptive to
inputs and capable of learning.
On the basis of the following criteria a network can be classified as neural: if it has set of
adaptive weights and is capable of approximating nonlinear functions of their inputs. The
adaptive weights can be thought of as connection strengths between neurons which are
activated during training and predictions.
Feed Forward Neural Networks: It is where connections between the nodes is not in a
cyclical form unlike the recurrent neural networks .Feed forward neural networks are the
22. 22
simplest networks and the information moves in forward direction only from inputs to the
outputs via hidden layer in between.
Supervised Learning: Where the inputs are unknown and associated weights are updated so
as to compute an actual output that matches the desired output.eg. pattern recognition and
regression model.
Unsupervised Learning: It is a learning method where input data has no set responses to
compare its actual output. eg. Cluster analysis.
Single Perceptron: Consist of a single layer of input and output along with associated
numerical weights. The sum of the products of the weights and the inputs is calculated in
each node, and if the value is above some threshold the neuron is activated value otherwise
stays deactivated .Such neurons classify as linear threshold units . Usually the activation
holds a value of +1 and deactivation -1, keeping threshold as 0.
Delta rule is utilized for training of perceptron which calculated the error arisen due to
unmatched actual and target output. In multi-layer neural network continuous output is
obtained .A common choice is the so-called logistic function:
eq1.1
A logistic function is an S- shaped function. The values of x range from
-∞ to +∞.
Multilayer Perceptron: It’s a two layer neural network with ability to perform XORing. In
the figure shown in the next page the numerals within neurons represent the threshold. The
numbers that are depicted on the interconnections are the weight of the inputs. In this case if
threshold is not attained then the output is zero.
This class of networks consists of multiple layers of computational units, usually
interconnected in a feed-forward way. They usually employ sigmoid function as their activation
function. As per universal approximation theorem every continuous function which maps
intervals of real numbers with some output can be estimated using MLP with a single hidden
layer. Mostly the MLP’s employ back propagation as their learning technique for computing
data.
23. 23
Figure 7: A generalized structure of neural network.
Sigmoid Function: A math function having an S shaped curve.It is usually called special
case of logistic function.
Figure 8: Graphical representation of sigmoidal function.
It is defined by 𝑆( 𝑡) = 1/(1 + 𝑒−𝑡
). It is a limited differentiable real function which is defined
for real values and gives a positive derivative. 𝑆′( 𝑡) = 𝑆(𝑡)(1− 𝑆( 𝑡)).
1.7 Backpropagation
Also called as "backward propagation of errors", is a popular method of training MLP
artificial neural networks in conjunction with an optimization method. Gradient of loss
function is calculated with respect to the associated weights. Optimization method is fed with
the gradient which alters the weights used to reduce the loss function.
24. 24
This algorithm works on the supervised learning rule i.e. requires a known, desired output for
each input value in order to calculate the loss function gradient. It makes use of sigmoid
function as its activation usually. The algorithm can be split in two phases:
Phase 1: Propagation:
1. Forward propagation: Input is fed through the network to generate propagation's
output activations.
2. Backward propagation: A feedback network is formed by feeding the output as
input in order to generate a difference between actual and the target outputs.
Phase 2: Weight update:
1. Gradient of weight is a product of difference of outputs and input activation.
2. Subtract a ratio (percentage) of the gradient from the weight.
Ratio or the learning rate affects the speed and the quality of learning. Greater the ratio the
neurons are trained quickly but accuracy is assured with a lower ratio. If weight updation
increases in positive side the error shall increase. Repeat the steps to attain satisfactory results.
1.8 Comparison to other technologies: From the available biometric authentication
methods face recognition is the easier and reliable .No cooperation required from the subject
to be captured for identification and verification process .It is widely used on airports, defense
ministries, multiplexes and other areas which experience a threat to security of data or people.
Other biometrics like fingerprints, iris scans, and speech recognition cannot perform mass
identification unlike facial recognition systems. However, queries are being raised on the
efficiency of such systems.
Drawbacks: Though it is flexible and socially acceptable but it is not a perfect way out to solve
the problems at hand related to security of country and its people .Face recognition is not
perfect and struggles to perform under certain conditions.
Certain boundations like angles at which images to be captured, poor illumination, age related
changes in face or surgical, resolution of images, or clustered images, skin color, facial
expressions does pose as obstacle for the same .
25. 25
1.9 Problem Statement
The advent of biometrics brought about a change in the way security of data or people and
country was treated. Be it via surveillance systems or the full body scans or the retina scans,
hand scans, finger prints not only it has changed the security setup of the country but it offers
a sense of reliability to the users and system as well. It has invented ways to protect and
preserve the confidential data and guard the system(of any type) with help of human and
computer interaction. It has brought about a great combination of image processing and
computer vision to light, that in a way have boosted the business for detection and recognition
systems. In this paper we propose a similar approach to detect and recognize a facial image
using a BPNN with help of Matlab. In the Matlab we have worked under the neural network,
using its tools to train and process the image for obtaining the performance plot.
CHAPTER 2. Literature Review
Whatever be the approaches be it Neural Network based[1][2] or any other approach
requirement of large database is required of facial images and non facial images as well to carry
out a gray scale based face detection. The non facial images pose a hurdle in that case.
26. 26
Schneiderman and Kanade [4] have lent their modus operandi for the frontal faces and for
profile view detections.
The approach for non faces is a feature based method which encompasses the geometrical
attributes with belief networks [5].For real time situations [6] geometrical facial templates and
Hough transform can also be put to use. Facial detectors like Markov random fields [7][8] and
Markov chain[9] are detecting faces using the spatial arrangement of gray scale pixel values.
For tracking faces [10] whose initial position is known model based approaches are the answer,
eg. A 3d based deformable approach is good at tracking moving faces, provided that other
facial features are already known. A way for reducing the burden of detection algorithms are
motion and color they combined with other available information can add to fast detection and
tracking [11] by algorithms.
In a research paper the compilation of motion based tracking with Hidden Markov model has
already been used. A lot proposal have been published where in neural network combined with
motion and color filters[12] have been put to use. Zhao [14] provided survey and tasks for face
recognition of images in a video and for those which are still as well.
Turk [15] calculated the Eigen faces with linear algebra technique. This method enabled real-
time automated face recognition system. Liao [16] opted for fast fourier transform algorithms
before applying recognition.
Face recognition is performed using Principal Component Analysis (PCA) and Independent
Component Analysis (ICA) after the FFT has whitened the input images. Results have proved
that the whitening process enhances the recognition performances for both PCA and ICA.
Contrast enhancement and edge detection were used by Jarillo [17] for carrying out
transformations of input images. Discriminatory information can be obtained from such
experiments helpful for face classification, namely PCA, LDA (Linear Discriminate Analysis).
Database of YALE and FERET were used. Lin [18] opted for Hausdroff distance in
recognition procedure over Euclidian distance. Satisfactory results were obtained for the same.
Wavelet transform in combination with PCA was put to use by A. Eleyan [19].
27. 27
Innovative methods such as Multi resolution Feature Concatenation (MFC), where PCA is used
for dimensional reduction on each sub band and resulting each sub band together for
performing classification, second is Multi resolution Majority Voting (MMV) where PCA and
classification are done separately on each sub band. MMV outperforms the MFC approach.
A DCT approach was used to minimize the problem of illumination by Shermina.J [20]. DCT
applied image later is used in recognition step and for this step PCA approach is used.
Unauthorized access is regulated by designing systems which can provide security and
reliability. So, building of efficient face recognition systems is very necessary for this.
Agarwal et al. [3] deduced an approach of coding and decoding an image to recognize a face.
It was carried in two stages firstly using PCA as method for extraction of features and then
using feed forward BPNN for recognition. Comparisons were made in terms of efficiency with
K-means, Fuzzy Ant with fuzzy C-means and better recognition rate was obtained compared
to the other two techniques.
Bhati et al.[4] opted for eigen faces for the feature vectors. The paper took novel route by
including aspects like mean square error, recognition rate and training time. Single layer neural
can give 100% recognition accuracy if correct learning rate was assigned to it.
They proposed that BPNN is more sensitive to hidden neurons and learning rate. Using
wavelets for feature extraction helped both neural networks showed improved results. Kim et
al. [5], suggested to improve the recognition rate combine the MLNN with a method that
computes the small number of eigen face feature vector through PCA.
The method improved functioning with variations in illumination and noise to minimize the
recognition error. The rectangular feature technique was used to extract the face from the
background image and after that it performed recognition by reducing image size and
denoising image, a proper vector through PCA was evaluated and finally applied MLNN on
the resultant image.
Existing PCA matching method was compared with the combinational approach of PCA and
MLNN by Benzaoui et al. [6] proposed a technique involving two stages: learning and
28. 28
detection. In first phase, conversion to gray scale was done and then they were filtered by
median filter to reduce noise. Feature extraction was done by choosing appropriate information
from each picture to prepare the data for classifier learning.
System used DCT, to extract a vector of appropriate coefficient that symbolizes necessary
parameters for modeling the original one. There were problems due to variations in postures
so Saudagare et al. [7] proposed a system which involving inhibition of consequences of
changes in illumination and pose. Proposed a 2-level Haar wavelet transform to partition the
facade image into seven sub-image bands.
From the sub–image bands, eigenfaces features were extracted & used as input for BPNN based
classification algorithm. Denoising the images was necessary for enhanced & accurate
outcome to be achieved. So, Daramola et al. [8] designed a system with following procedure :
face detection ,pre-processing, principle component analysis (PCA) and classification.
Firstly, database with images in different poses was made, followed by normalization and
denoising the image. Eigen faces were calculated from the training set. Using PCA to calculate
eigen values resulted in were calculated larger eigen values in comparison training set images.
In ANN approach face descriptors were used as an input for training the network. Non-linearity
of the images was out looked by Kashemet al. [9].
A computational model for face detection and recognition, which was accurate, simple and fast
in forced environments was brought to light. Here eigen faces[10] were used for recognition to
up the accuracy and computation .
Whereas when Combining PCA and BPNN techniques non-linear face images were easily
distinguished. Conclusion was made that this method had execution time of few seconds and
its acceptance ratio was more than 90 %.
In [11], Chung et al. referred to a new face recognition method based on Principal Component
Analysis (PCA) and Gabor filter responses. First method, Gabor filtering was applied on
predefined fiducial points to match robust facial features from the original face image. Second
method involved transformation of the facial features into Eigen space by PCA, for optimal
classification of individual facial representations.
29. 29
CHAPTER 3 Proposed Methodology
The advent of biometrics brought about a change in the way security of data or people and
country was treated. Be it via surveillance systems or the full body scans or the retina scans,
hand scans, finger prints not only it has changed the security setup of the country but it offers
a sense of reliability to the users and system as well. It has invented ways to protect and
preserve the confidential data and guard the system (of any type) with help of human and
computer interaction. It has brought about a great combination of image processing and
computer vision to light, that in a way have boosted the business for detection and recognition
30. 30
systems. In this paper we propose a similar approach to detect and recognize a facial image
using a BPNN with help of MATLAB 8.2. In the MATLAB we have worked using the neural
network tool box, within which we have made use of the neural network fitting tool to train
and test the facial image at hand. This tool maps the numeric inputs to their targets. It helps in
training, testing and validating the data and evaluates the data on mean square error generated;
performance plot and if regression needed plot for the same data can also be put to use to make
an inference regarding the performance of the process
1. Detection of face and its features using Viola Jones algorithm.
2. Conversion from rgb to grayscale and binary.
3. Segmentation of image can be done in any format.
4. Histogram equivalent of binary or grayscale image.
5. Each segmented part is trained individually using the back propagation neural
network tool.
6. The above mentioned steps can also be carried out without segmentation of the
image.
7. For recognition process neural network tool in Matlab is used to train and test the
sample data provided by the histogram equivalent of the image taken.
8. In neural network tool, curve fitting tool has been made use to train the data.
9. The training of data takes place in 3 stages mainly training, testing and validation.
10. The mean squared error is a parameter which is focused to be able to contemplate
whether the training is giving desirable results or not.
11. The data is retrained to achieve better performance plots with least mean squared
error.
12. The training phase continues with usage of different samples in order to check the
performance of the network and to get better results.
3.1Viola Jones Algorithm:
It is a preferred algorithm for frontal face detection due to its higher accuracy and better
adaptability to real time situations. It trains across 5000 frontal faces and 300 million non faces.
The faces are normalized, scaled and then translated. Faces itself offer numerous variations due
to poses, marks, illumination, angles and various other innumerable factors. The key to face
31. 31
detection is that every image is it mine or yours contains 10-50 thousand locks or scales on it
and rarity offered by per image is 0-50.
Under the four stages described before the first stage that is haar feature selection- all the
regularities found in the images is matched like the cheek regions are lighter than the eye
regions and the region enclosing bridge of the nose is even brighter than the eyes.
Figure 9: Flow chart for Viola Jones Algo
the mouth and the nose. Value: oriented gradients of pixel intensities. These features are then
matched by the algo. Under Rectangle features: the sum of pixels in black is subtracted from
those in white to get the value. Rectangle boxes are covered over the features being detected to
show how the pixel count is being done.
The integral image calculates all the rectangular features in a fixed interval of time, which
benefits them by speeding their operation over the detection of other features not included in
rectangular features.
A variant of Ada boost learning algorithm is employed to select the best features and also to
train the classifiers to use them. So construction of strong classifier begins by cascading the
previously used weak classifiers. The complex the structure of a classifier better and stronger
it becomes for detection purpose
INPUT
IMAGE
HAAR
FETURE
INTEGRA
L IMAGE
ADABOO
ST
CASCADI
NG
32. 32
Figure 10: Rectangular features used in viola jones algo
.
Evaluation of classifiers does take a lot of time. To avoid this every successive classifier
continues to stick to the samples trained by previous ones. Since every feature is related to a
location in sub window, so if by any chance a sub window is rejected from inspection by a
classifier no further processing happens for the same window and a new sub window is
searched for.
The very first cascade has job to roughly half the time for which the entire cascade has to do
the evaluation. Every stage of classifier has a strong classifier which makes the features to be
grouped under the stages with every stage possessing certain features. The stage of classifier
check whether or not a given sub window classifies as face or not. If not a face then it is
discarded away.
3.2 Haar Features:
They are digital image features used in object recognition. They owe their name to their
intuitive similarity with Haar wavelets and were used in the first real-time face detector.
Historically, working with only image intensities (i.e., the RGB pixel values at each and every
pixel of image) made the task of feature calculation computationally expensive.
Viola and Jones adapted the idea of using Haar wavelets and developed the so-called Haar-like
features. A Haar-like feature considers adjacent rectangular regions at a specific location in a
33. 33
detection window, sums up the pixel intensities in each region and calculates the difference
between these sums.
This difference is then used to categorize subsections of an image. For example, let us say we
have an image database with human faces. It is a common observation that among all faces the
region of the eyes is darker than the region of the cheeks.
Therefore a common haar feature for face detection is a set of two adjacent rectangles that lie
above the eye and the cheek region. The position of these rectangles is defined relative to a
detection window that acts like a bounding box to the target object (the face in this case).
In the detection phase of the Viola–Jones object detection framework, a window of the target
size is moved over the input image, and for each subsection of the image the Haar-like feature
is calculated.
This difference is then compared to a learned threshold that separates non-objects from objects.
Because such a Haar-like feature is only a weak learner or classifier (its detection quality is
slightly better than random guessing) a large number of Haar-like features are necessary to
describe an object with sufficient accuracy.
In the Viola–Jones object detection framework, the Haar-like features are therefore organized
in something called a classifier cascade to form a strong learner or classifier.
The key advantage of a Haar-like feature over most other features is its calculation speed. Due
to the use of integral images, a Haar-like feature of any size can be calculated in constant time
(approximately 60 microprocessor instructions for a 2-rectangle feature).
A simple rectangular Haar-like feature can be defined as the difference of the sum of pixels of
areas inside the rectangle, which can be at any position and scale within the original image.
This modified feature set is called 2-rectangle feature.
Viola and Jones also defined 3-rectangle features and 4-rectangle features. The values indicate
certain characteristics of a particular area of the image. Each feature type can indicate the
existence (or absence) of certain characteristics in the image, such as edges or changes in
texture. For example, a 2-rectangle feature can indicate where the border lies between a dark
region and a light region
One of the contributions of Viola and Jones was to use summed area tables,[3] which they
called integral images. Integral images can be defined as two-dimensional lookup tables in the
form of a matrix with the same size of the original image. Each element of the integral image
34. 34
contains the sum of all pixels located on the up-left region of the original image (in relation to
the element's position). This allows to compute sum of rectangular areas in the image, at any
position or scale, using only four lookups:
Finding the sum of the shaded rectangular area
eq 3.1.
where points belong to the integral image , as shown in the figure.
Each Haar-like feature may need more than four lookups, depending on how it was defined.
Viola and Jones's 2-rectangle features need six lookups, 3-rectangle features need eight
lookups, and 4-rectangle features need nine lookups.
Figure 11: Flowchart for facial recognition
35. 35
Figure 12: Finding sum of the shaded rectangular area.
3.3 Integral Image
A summed area table is a data structure and algorithm for quickly and efficiently generating
the sum of values in a rectangular subset of a grid. In the image processing domain, it is also
known as an integral image. It was introduced to computer graphics in 1984 by Frank
Crow for use with mipmaps. In computer vision it was popularized by Lewis and then given
the name "integral image" and prominently used within the Viola–Jones object detection
framework in 2001.
Historically, this principle is very well known in the study of multi-dimensional probability
distribution functions, namely in computing 2D (or ND) probabilities (area under the
probability.
Algorithm: As the name suggests, the value at any point (x, y) in the summed area table is just
the sum of all the pixels above and to the left of (x, y), inclusive:
eq3.2
Moreover, the summed area table can be computed efficiently in a single pass over the image,
using the fact that the value in the summed area table at (x, y) is just:
eq3.3
36. 36
A description of computing a sum in the Summed Area Table data structure/algorithm
Once the summed area table has been computed, the task of evaluating the intensities over
any rectangular area requires only four array references. This allows for a constant
calculation time that is independent of the size of the rectangular area. That is, using the
notation in the figure at right, having A=(x0, y0), B=(x1, y0), C=(x0, y1) and D=(x1, y1), the
sum of i(x,y) over the rectangle spanned by A, B,C and D is:
eq3.4
distribution) from the respective cumulative distribution functions.
3.4 Ada Boost Training algorithm:
Ada Boost, short for "Adaptive Boosting", is a machine learning meta-algorithm formulated
by Yoav Freund and Robert Schapire who won the Gödel Prize in 2003 for their work. It can
be used in conjunction with many other types of learning algorithms to improve their
performance.
The output of the other learning algorithms ('weak learners') is combined into a weighted sum
that represents the final output of the boosted classifier. Ada Boost is adaptive in the sense that
subsequent weak learners are tweaked in favor of those instances misclassified by previous
classifiers.
Ada Boost is sensitive to noisy data and outliers. In some problems it can be less susceptible
to the over fitting problem than other learning algorithms. The individual learners can be weak,
but as long as the performance of each one is slightly better than random guessing (e.g., their
error rate is smaller than 0.5 for binary classification), the final model can be proven to
converge to a strong learner.
While every learning algorithm will tend to suit some problem types better than others, and
will typically have many different parameters and configurations to be adjusted before
37. 37
achieving optimal performance on a dataset, Ada Boost (with decision trees as the weak
learners) is often referred to as the best out-of-the-box classifier.
When used with decision tree learning, information gathered at each stage of the Ada Boost
algorithm about the relative 'hardness' of each training sample is fed into the tree growing
algorithm such that later trees tend to focus on harder-to-classify examples.
Problems in machine learning often suffer from the curse of dimensionality — each sample
may consist of a huge number of potential features (for instance, there can be 162,336 Haar
features, as used by the Viola–Jones object detection framework, in a 24×24 pixel image
window), and evaluating every feature can reduce not only the speed of classifier training and
execution, but in fact reduce predictive power, per the Hughes Effect.
Unlike neural networks and SVMs, the Ada Boost training process selects only those features
known to improve the predictive power of the model, reducing dimensionality and potentially
improving execution time as irrelevant features do not need to be computed.
Training
Ada Boost refers to a particular method of training a boosted classifier. A boost classifier is a
classifier in the form
eq3.5
where each is a weak learner that takes an object as input and returns a real valued result
indicating the class of the object. The sign of the weak learner output identifies the predicted
object class and the absolute value gives the confidence in that classification. Similarly, the
-layer classifier will be positive if the sample is believed to be in the positive class and negative
otherwise.
Each weak learner produces an output, hypothesis , for each sample in the training set.
At each iteration , a weak learner is selected and assigned a coefficient such that the sum
training error of the resulting -stage boost classifier is minimized.
eq3.6
38. 38
Here is the boosted classifier that has been built up to the previous stage of
training, is some error function and is the weak learner that is being
considered for addition to the final classifier.
Weighting: At each iteration of the training process, a weight is assigned to each sample in the
training set equal to the current error on that sample. These weights can be
used to inform the training of the weak learner, for instance, decision trees can be grown that
favor splitting sets of samples with high weights.
3.5 Cascading Amplifiers
A cascade amplifier is any two-port network constructed from a series of amplifiers, where
each amplifier sends its output to the input of the next amplifier in a daisy chain.[1]
The complication in calculating the gain of cascaded stages is the non-ideal coupling between
stages due to loading. Two cascaded common emitter stages are shown below. Because the
input resistance of the second stage forms a voltage divider with the output resistance of the
first stage, the total gain is not the product of the individual (separated) stages.
Figure 13: Cascading amplifier, a simplified diagram
On average only 0.01% of all sub-windows are positive (faces).Equal computation time is
spent on all sub-windows. Must spend most time only on potentially positive sub-windows. A
simple 2-feature classifier can achieve almost 100% detection rate with 50% FP rate. That
classifier can act as a 1st layer of a series to filter out most negative windows 2nd layer with
10 features can tackle “harder” negative-windows which survived the 1st layer, and so on…
39. 39
A cascade of gradually more complex classifiers achieves even better detection rates. The
evaluation of the strong classifiers generated by the learning process can be done quickly, but
it isn’t fast enough to run in real-time. For this reason, the strong classifiers are arranged in a
cascade in order of complexity, where each successive classifier is trained only on those
selected samples which pass through the preceding classifiers.
If at any stage in the cascade a classifier rejects the sub-window under inspection, no further
processing is performed and continue on searching the next sub-window. The cascade therefore
has the form of a degenerate tree. In the case of faces, the first classifier in the cascade – called
the attentional operator – uses only two features to achieve a false negative rate of
approximately 0% and a false positive rate of 40%. The effect of this single classifier is to
reduce by roughly half the number of times the entire cascade is evaluated.
In cascading, each stage consists of a strong classifier. So all the features are grouped into
several stages where each stage has certain number of features.
The job of each stage is to determine whether a given sub-window is definitely not a face or
may be a face. A given sub-window is immediately discarded as not a face if it fails in any of
the stages.
A simple framework for cascade training is given below:
User selects values for f, the maximum acceptable false positive rate per layer and d, the
minimum acceptable detection rate per layer.
User selects target overall false positive rate Ftarget.
P = set of positive examples
N = set of negative examples.
40. 40
CHAPTER 4 Software Implementation
4.1 MATLAB :
The software used for implementing the approach was MATLAB. MATLAB is matrix
laboratory capable enough to perform various mathematical operations with varied features
like fourth generation programming language to implement a approach.
It allows a user to plot functions, data , implementation of algorithms, creation of user
interfaces and interfaces with programmes written in different languages. It has numerous tool
boxes to work with depending upon the type of problem at hand. It has found users not only in
field of engineering but also in economics and science .It keeps on improving itself by
launching newer and enhanced versions of itself with novel tools to discover s & put to use for
varied purposes to satisfy.
Developed by Clever Moler in the 1970’s to give his students a better option to work with
like LINSPACK , EISPACK without having to learn Fortran. Researchers like Jack Little made
some much needed improvements to it and by the year 2000 newer libraries were added for
matrix manipulation.
41. 41
The application is built around MATLAB scripting language and their usual usages include
writing commands in the command window which executes the written code. In MATLAB we
have worked under the neural network tool box where in the face recognition was done for the
image.
4.2Open Source Computer Vision
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed
at real-time computer vision, originally developed by Intel's research center in Nizhny
Novgorod (Russia), later supported by Willow Garage and now maintained by Itseez.The
library is cross-platform and free for use under the open-source BSD license.
Officially launched in 1999, the OpenCV project was initially an Intel Research initiative to
advance CPU-intensive applications, part of a series of projects including real-time ray
tracing and 3D display walls. The main contributors to the project included a number of
optimization experts in Intel Russia, as well as Intel’s Performance Library Team. In the early
days of OpenCV, the goals of the project were described as:
Advance vision research by providing not only open but also optimized code for basic
vision infrastructure. No more reinventing.
Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readily readable and transferable.
Advance vision-based commercial applications by making portable, performance-
optimized code available for free—with a license that did not require to be open or free
themselves.
The first alpha version of OpenCV was released to the public at the IEEE Conference on
Computer Vision and Pattern Recognition in 2000, and five betas were released between 2001
and 2005. The first 1.0 version was released in 2006. In mid-2008, OpenCV obtained corporate
support from Willow Garage, and is now again under active development. A version 1.1 "pre-
release" was released in October 2008.
The second major release of the OpenCV was on October 2009. OpenCV 2 includes major
changes to the C++ interface, aiming at easier, more type-safe patterns, new functions, and
better implementations for existing ones in terms of performance (especially on multi-core
systems). Official releases now occur every six months and development is now done by an
independent Russian team supported by commercial corporations.
42. 42
In August 2012, support for OpenCV was taken over by a non-profit foundation OpenCV.org,
which maintains a developer[4] and user site.
OpenCV is written in C++ and its primary interface is in C++, but it still retains a less
comprehensive though extensive older C interface. There are bindings in Python, Java and
MATLAB/OCTAVE. The API for these interfaces can be found in the online
documentation.[6] Wrappers in other languages such as C#, Perl, Ch, and Ruby have been
developed to encourage adoption by a wider audience. All of the new developments and
algorithms in OpenCV are now developed in the C++ interface.
4.3 Computer Vision Tools:
Computer Vision System Toolbox provides algorithms, functions, and apps for designing and
simulating computer vision and video processing systems. You can perform feature detection,
extraction, and matching; object detection and tracking; motion estimation; and video
processing. For 3-D computer vision, the system toolbox supports camera calibration, stereo
vision, 3-D reconstruction, and 3-D point cloud processing. With machine learning based
frameworks, you can train object detection, object recognition, and image retrieval systems.
Algorithms are available as MATLAB functions, System objects, and Simulink blocks. For
rapid prototyping and embedded system design, the system toolbox supports fixed-point
arithmetic and C-code generation.
Key Features:
Object detection and tracking, including the Viola-Jones, Kanade-Lucas-Tomasi (KLT), and
Kalman filtering methods
Training of object detection, object recognition, and image retrieval systems, including cascade
object detection and bag-of-features methods
Camera calibration for single and stereo cameras, including automatic checkerboard detection
and an app for workflow automation
Stereo vision, including rectification, disparity calculation, and 3-D reconstruction
3-D point cloud processing, including I/O, visualization, registration, denoising, and geometric
shape fitting
43. 43
Feature detection, extraction, and matching
Support for C-code generation and fixed-point arithmetic with code generation products.
Object Detection and Recognition: Object detection and recognition are used to locate, identify,
and categorize objects in images and video. Computer Vision System Toolbox provides a
comprehensive suite of algorithms and tools for object detection and recognition.
Object Classification: You can detect or recognize an object in an image by training an object
classifier using pattern recognition algorithms that create classifiers based on training data from
different object classes. The classifier accepts image data and assigns the appropriate object or
class label.
Computer Vision System Toolbox provides algorithms to train image classification and image
retrieval systems using the bag-of-words model. The system toolbox also provides feature
extraction techniques you can use to create custom classifiers using supervised classification
algorithms from Statistics and Machine Learning Toolbox.
4.4 Vision Cascade Object Detector:
Detect objects using the Viola-Jones algorithm The cascade object detector uses the Viola-
Jones algorithm to detect people's faces, noses, eyes, mouth, or upper body. One can also use
the Training Image Labeler to train a custom classifier to use with this System object.
Construction
detector = vision.CascadeObjectDetector creates a System object, detector, that detects objects
using the Viola-Jones algorithm. The ClassificationModel property controls the type of object
to detect. By default, the detector is configured to detect faces.
detector = vision.CascadeObjectDetector(MODEL) creates a System object, detector,
configured to detect objects defined by the input string, MODEL. The MODEL input describes
the type of object to detect. There are several valid MODEL strings, such as
'FrontalFaceCART', 'UpperBody', and 'ProfileFace'. detector = vision.
CascadeObjectDetector(XMLFILE) creates a System object, detector, and configures it to use
the custom classification model specified with the XML FILEinput. The XMLFILE can be
created using the trainCascadeObjectDetector function or OpenCV (Open Source Computer
Vision) training functionality.
44. 44
You must specify a full or relative path to the XMLFILE, if it is not on the MATLAB path.
detector = vision.CascadeObjectDetector(Name,Value) configures the cascade object detector
object properties. You specify these properties as one or more name-value pair arguments.
Unspecified properties have default values.
To detect a feature:
1. Define and set up your cascade object detector using the constructor.
2. Call the step method with the input image, I, the cascade object detector object, detector,
points PTS, and any optional properties. See the syntax below for using the stepmethod.
Use the step syntax with input image, I, the selected Cascade object detector object, and any
optional properties to perform detection.
BBOX = step(detector, I) returns BBOX, an M-by-4 matrix defining M bounding boxes
containing the detected objects. This method performs multiscale object detection on the input
image, I. Each row of the output matrix, BBOX, contains a four-element vector, [x y width
height], that specifies in pixels, the upper-left corner and size of a bounding box. The input
image I, must be a grayscale or true color (RGB) image.
BBOX = step(detector, I,roi) detects objects within the rectangular search region specified
by roi. You must specify roi as a 4-element vector, [x y width height], that defines a rectangular
region of interest within image I. Set the 'Use ROI' property to true to use this syntax.
Why Train a Detector?
The vision.cascadeobjectdetector System object comes with several pretrained classifiers for
detecting frontal faces, profile faces, noses, eyes, and the upper body. However, these
classifiers are not always sufficient for a particular application. Computer Vision System
Toolbox provides the trainCascadeObjectDetector function to train a custom classifier.
What Kinds of Objects Can You Detect?: The Computer Vision System Toolbox cascade
object detector can detect object categories whose aspect ratio does not vary significantly.
Objects whose aspect ratio remains approximately fixed include faces, stop signs, and cars
viewed from one side. The vision.cascadeobjectdetector System object detects objects in
images by sliding a window over the image. The detector then uses a cascade classifier to
decide whether the window contains the object of interest.
The size of the window varies to detect objects at different scales, but its aspect ratio remains
fixed. The detector is very sensitive to out-of-plane rotation, because the aspect ratio changes
45. 45
for most 3-D objects. Thus, you need to train a detector for each orientation of the object.
Training a single detector to handle all orientations will not work.
Figure 14: Training cascade detector
How Does the Cascade Classifier Work?
The cascade classifier consists of stages, where each stage is an ensemble of weak learners.
The weak learners are simple classifiers called decision stumps. Each stage is trained using a
technique called boosting. Boosting provides the ability to train a highly accurate classifier by
taking a weighted average of the decisions made by the weak learners.
Each stage of the classifier labels the region defined by the current location of the sliding
window as either positive or negative. Positive indicates that an object was found
andnegative indicates no objects were found. If the label is negative, the classification of this
region is complete, and the detector slides the window to the next location. If the label is
positive, the classifier passes the region to the next stage. The detector reports an object found
at the current window location when the final stage classifies the region as positive.
The stages are designed to reject negative samples as fast as possible. The assumption is that
the vast majority of windows do not contain the object of interest. Conversely, true positives
are rare and worth taking the time to verify.
A true positive occurs when a positive sample is correctly classified.
A false positive occurs when a negative sample is mistakenly classified as positive.
A false negative occurs when a positive sample is mistakenly classified as negative.
46. 46
To work well, each stage in the cascade must have a low false negative rate. If a stage
incorrectly labels an object as negative, the classification stops, and you cannot correct the
mistake. However, each stage can have a high false positive rate. Even if the detector
incorrectly labels a non object as positive, you can correct the mistake in subsequent stages.
The overall false positive rate of the cascade classifier is fs, where f is the false positive rate
per stage in the range (0 1), and s is the number of stages. Similarly, the overall true positive
rate is ts, where t is the true positive rate per stage in the range (0 1]. Thus, adding more stages
reduces the overall false positive rate, but it also reduces the overall true positive rate.
4.5 Neural Network Tools:
It provides algo’s, functions, and various app’s to create , train , visualize and simulate the
functions and the network. It offers regression, clustering, pattern recognition, reduction in
dimension, dynamic system modeling and control. The toolbox includes convolutional neural
networks ,deep learning algo’s for the various tasks to compute. The Parallel computing tool
box offers to compute large data sets by speeding their training and spreading it across multi
core processors & GPU’s .Among the various features to offer it allows to work its supports
supervised & unsupervised learning algo’s offers pre & post processing for better training
efficiency and enhancing the network performance.
Under the network architecture it offers supervised networks, and unsupervised networks,
under supervised networks it has four types to offer which are: feed forward networks, radial
basis networks, learning vector quantization and dynamic networks. We have used feed
forward networks to compute our data , which one way connections and no feedback
mechanism.
Feed forward networks support nonlinear function fitting, pattern recognition, and prediction
networks. Then they are supported feed forward networks which include feed forward back
propagation, cascade forward back propagation , feed forward input delay back propagation &
linear and perceptron networks.
4.6 Neural Network Fitting tool
nftool opens the neural network fitting tool GUI. nftool leads you through solving a data fitting
problem, solving it with a two-layer feed-forward network trained with Levenberg-Marquardt.
47. 47
Neural networks are good at fitting functions. In fact, there is proof that a fairly simple neural
network can fit any practical function.
Suppose, for instance, that you have data from a housing application. You want to design a
network that can predict the value of a house (in $1000s), given 13 pieces of geographical and
real estate information. You have a total of 506 example homes for which you have those 13
items of data and their associated market values.
You can solve this problem in two ways:
Use a graphical user interface, nftool, as described in Using the Neural Network Fitting Tool.
Use command-line functions, as described in Using Command-Line Functions.
It is generally best to start with the GUI, and then to use the GUI to automatically generate
command-line scripts. Before using either method, first define the problem by selecting a data
set. Each GUI has access to many sample data sets that you can use to experiment with the
toolbox (see Neural Network Toolbox Sample Data Sets). If you have a specific problem that
you want to solve, you can load your own data into the workspace. The next section describes
the data format.
4.7 Training Algorithm
It supports training and learning functions which are ways to manipulate the numerical weights
& biases associated with the networks. Where a training function is more of an global algorithm
affecting the weights and biases of the whole network, learning function concentrates on the
individual weights and biases.
The algorithms supported by neural network tool box include Levenberg Marquardt Algo,
resilient back propagation algo, including many gradient descent methods and conjugate
gradient methods. The modular framework helps in developing custom trained algo’s as well.
When training the network one is allowed to use the error weight to check the relation between
them and the desired outputs.
The algo can be accessed directly from the command line or via apps that show some
diagrammatical representation of the networks in use and provide features to monitor and train
the specific network. A suite of learning functions, including gradient descent, Hebbian
learning, LVQ, Widrow-Hoff, and Kohonen is also provided.
48. 48
trainlm is a network training function that updates weight and bias values according to
Levenberg-Marquardt optimization. trainlm is often the fastest Backpropagation algorithm in
the toolbox, and is highly recommended as a first-choice supervised algorithm, although it does
require more memory than other algorithms.
trainlm supports training with validation and test vectors if the
network's NET.divideFcn property is set to a data division function. Validation vectors are used
to stop training early if the network performance on the validation vectors fails to improve or
remains the same for max_fail epochs in a row. Test vectors are used as a further check that
the network is generalizing well, but do not have any effect on training.
trainlm can train any network as long as its weight, net input, and transfer functions have
derivative functions.
Backpropagation is used to calculate the Jacobian jX of performance perf with respect to the
weight and bias variables X. Each variable is adjusted according to Levenberg-Marquardt,
jj = jX * jX eq 4.1
je = jX * E eq 4.2
dX = -(jj+I*mu) je eq 4.3
where E is all errors and I is the identity matrix.
The adaptive value mu is increased by mu_inc until the change above results in a reduced
performance value. The change is then made to the network and mu is decreased bymu_dec.
The parameter mem_reduc indicates how to use memory and speed to calculate the
Jacobian jX.
If mem_reduc is 1, then trainlm runs the fastest, but can require a lot of memory.
Increasing mem_reduc to 2 cuts some of the memory required by a factor of two, but
slows trainlm somewhat. Higher states continue to decrease the amount of memory needed and
increase training times.
Training stops when any of these conditions occurs:
The maximum number of epochs (repetitions) is reached.
The maximum amount of time is exceeded.
Performance is minimized to the goal.
The performance gradient falls below min_grad.
49. 49
mu exceeds mu_max.
Validation performance has increased more than max_fail times since the last time it decreased
(when using validation). In mathematics and computing, the Levenberg–Marquardt algorithm
(LMA), also known as the damped least-squares (DLS) method, is used to solve non-linear
least squares problems. These minimization problems arise especially in least squares curve
fitting. The LMA is used in many software applications for solving generic curve-fitting
problems. However, as for many fitting algorithms, the LMA finds only a local minimum,
which is not necessarily the global minimum.
The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method
of gradient descent. The LMA is more robust than the GNA, which means that in many cases
it finds a solution even if it starts very far off the final minimum. For well-behaved functions
and reasonable starting parameters, the LMA tends to be a bit slower than the GNA. LMA can
also be viewed as Gauss–Newton using a trust region approach.
4.8 Programming Code for Face Detection and its features
Code for face detection:
clear all
clc
%Detect objects using Viola-Jones Algorithm
%To detect Face
FDetect = vision.CascadeObjectDetector;
%Read the input image
I = imread('HarryPotter.jpg');
%Returns Bounding Box values based on number of objects
BB = step(FDetect,I);
figure,
imshow(I); hold on
for i = 1:size(BB,1)rectangle('Position',BB(i,:),'LineWidth',5,'LineStyle','-','EdgeColor','r');
end
title('Face Detection');
hold off;
50. 50
Figure 15: Faces detectedby algo are highlighted by bounding box
Code for Nose Detection:
%To detect Nose
NoseDetect = vision.CascadeObjectDetector('Nose','MergeThreshold',16);
BB=step(NoseDetect,I);
figure,
imshow(I); hold on
for i = 1:size(BB,1)
rectangle('Position',BB(i,:),'LineWidth',4,'LineStyle','-','EdgeColor','b');
end
title('Nose Detection');
hold off;
Figure 16 : Nose detected
Explanation:
51. 51
To denote the object of interest as 'nose', the argument 'Nose' is passed.
vision.CascadeObjectDetector('Nose','MergeThreshold',16);
The default syntax for Nose detection :
vision.CascadeObjectDetector('Nose');
Code for Mouth Detection:
%To detect Mouth
MouthDetect = vision.CascadeObjectDetector('Mouth','MergeThreshold',16);
BB=step(MouthDetect,I);
figure,
imshow(I); hold on
for i = 1:size(BB,1)
rectangle('Position',BB(i,:),'LineWidth',4,'LineStyle','-','EdgeColor','r');
end
title('Mouth Detection');
hold off;
Figure 17: Mouth Detected
Code for eye detection:
52. 52
%To detect Eyes
EyeDetect = vision.CascadeObjectDetector('EyePairBig');
%Read the input Image
I = imread('harry_potter.jpg');
BB=step(EyeDetect,I);
figure,imshow(I);
rectangle('Position',BB,'LineWidth',4,'LineStyle','-','EdgeColor','b');
title('Eyes Detection');
Eyes=imcrop(I,BB);
figure,imshow(Eyes);
Figure 18: Eyes detected
4.9 Procedure for face recognition:
After detecting faces and its features we move on the task of recognition using the Neural
Network training for recognition to be completed. Here I am using different faces to show the
procedure of face recognition one can continue with the same. So first we read the image
convert to grayscale or binary as we will be making use of their equivalent histogram for
feeding the data in the neural network.
Coding for image conversion from rgb to gray and binary.
I=imread('im111.jpg')
imshow(I)
J=rgb2gray(I)
subplot(1,3,1);imshow(I)
subplot(1,3,2);imshow(J)
K=im2bw(J);
subplot(1,3,3); imshow(K);
hold on;
53. 53
Figure 19: sample images in rgb, gray and binary formats
Code for image segmentation
im=imread('im111.jpg');
n= fix(size(im,1)/2)
A=im(1:n,:,:);
B=im(n+1:end,:,:);
subplot(1,2,1);imshow(A);
subplot(1,2,2);imshow(B);
hold on;
Figure 20: Segmented image
One carry out the same procedure by segmenting the image as well. I have shown the
segmentation of images but recognition has been carried with the grayscale equivalent
histogram.
Code for binary image histogram
I=imread('im111.jpg');
J=rgb2gray(I);
K=im2bw(J);
subplot(1,3,1);imshow(I);
subplot(1,3,2);imshow(J);
54. 54
subplot(1,3,3);imshow(K);
imhist(K);
hold on;
Figure 21: Binary image histogram equivalent
Code for calculating histogram equivalent for a gray scale image
I=imread('im111.jpg');
J=rgb2gray(I);
K=im2bw(J);
subplot(1,3,1);imshow(I);
subplot(1,3,2);imshow(J);
subplot(1,3,3);imshow(K);
imhist(J);
hold on;
Figure 22: sample image 1 in rgb, sample image 2 in gray scale ,
55. 55
and histogram equivalent of sample image 2.
CHAPTER 5 Results and Discussion
For carrying out the training procedure a three layer neural network is used. The training and
testing of data has been done using the neural network tool in the MATLAB has been used.
We have used feed forward networks under the supervised learning architecture of neural
network tool box to compute our data, in which one way connections operate and no feedback
mechanism is included.
5.1 Results of face recognition:
After opening the nft tool we choose the data to be trained for recognition procedure. That
data is usually imported from an excel book.
Table 1: sample data for sample images 1, 2, 3
S.no Sample Input data Target data
1 1 10010101 10001111
2 2 11001100 11110000
3 3 10101100 10101011
After importing the data neural network tool starts the training depicted by the diagram shown
below. It carries out training in 3 states that is training validation and testing. Where in the
training stage the network is adjusted as per the error generated during it.
During the validation network generalization is monitored and is used to halt the training .In
testing the final solution is tested to confirm the predictive power of the network. Usually a lot
of training sessions are conducted to enhance the performance of network and lower the mean
squared error value which is the average squared difference between output and target.
56. 56
Figure 23: A three stage training procedure
The performance plot shown below is a criteria to check the performance of the network,
regression plots to can be devised for the same purpose but for the convenience I have stick to
performance plots.
Figure 24: Parameters calculated during training
57. 57
Figure 25: Performance plot for sample data 1.
Table 2: sample data for sample image 2
Input data Target data
462 0 0 102 1100
342 0 0 78 0010
234 0 0 65 1001
500 0 0 132 1010
222 0 0 69 1011
165 0 0 45 0111
59. 59
CHAPTER 6 Conclusionand future scope
6.1 Conclusion:
In this work it has been shown that if a facial image of a person is given then the network can
able to recognize the face of the person. The whole work is completed through the following
steps: Facial images of different persons have been collected for showing the detection and
recognition process in detail.
For detection purpose we used a group of people and detected their faces and features by
highlighting the detected regions using bounding boxes. For the purpose of detection we used
the vision.cascadeobjectdetector package in the computer vision tool box of the MATLAB
8.2.While as for recognition the neural network tool box was utilized and in which feed forward
back propagation architecture was utilized.
In neural network we have worked under the neural network fitting tool to train the facial
image data. The image taken for recognition was converted into grayscale and binary image
formats . Then taking the grayscale image we computed the histogram for it using which the
data was compiled for the image recognition.
One can also do the same using binary image . In neural network fitting tool the training was
carried out under 3 stages mainly training, validation and testing which have already been
explained before in the report.The training procedure computed the error to monitor the
performance of the network .In validation network generalization was the area of concentration
.
In testing the final solution was obtained by carrying out the training session for 5-10 times in
order to achieve better outcomes .Mean squared error was the parameter chosen to asses the
network’s execution so far. Lower values for mean squared were obtained for the same. Thus
with results one can conclude to have recognized the given image .The accuracy of this network
is 85% approx.
60. 60
6.2 Future Scope:
Face recognition systems used today work very well under constrained conditions, although all
systems work much better with frontal mug-shot images and constant lighting.
All current face recognition algorithms fail under the vastly varying conditions under which
humans need to and are able to identify other people. Next generation person recognition
systems will need to recognize people in real-time and in much less constrained situations.
I believe that identification systems that are robust in natural environments, in the presence of
noise and illumination changes, cannot rely on a single modality, so that fusion with other
modalities is essential .
Technology used in smart environments has to be unobtrusive and allow users to act freely.
Wearable systems in particular require their sensing technology to be small, low powered and
easily integrable with the user's clothing.
Considering all the requirements, identification systems that use face recognition and speaker
identification seem to us to have the most potential for wide-spread application. Cameras and
microphones today are very small, light-weight and have been successfully integrated with
wearable systems.
Audio and video based recognition systems have the critical advantage that they use the
modalities humans use for recognition. Finally, researchers are beginning to demonstrate that
unobtrusive audio-and-video based person identification systems can achieve high recognition
rates without requiring the user to be in highly controlled environments. Sooner or later the
process of face recognition can become automated using the neural network as its base
technique and some other technique in combination with.
61. 61
REFERENCES:
[1] H.A. Rowley, S. Baluja, and T. Kanade, “Neural network - based face detection,” IEEE Trans.Pattern Analysis
and Machine Intelligence, vol. 20, no. 1, pp. 23–38, Jan. 1998.
[2] H.A.Rowley, S. Baluja, and T. Kanade, “Rotation invariant neural net-workbased face detection,” Proc. IEEE
Int’l Conf. Computer Vision and Pattern Recognition, pp. 38–44, 1998.
[3] K.K Sung and T. Poggio, “Example-based learning for view-based human face detection,” IEEE Trans.
Pattern Analysis and Machine Intelli-gence, vol. 20, no. 1, pp. 39–51, Jan. 1998.
[4] H. Schneiderman and T. Kanade, “A statistical method for 3D object detection applied to faces and cars,”
Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 746–751, June 2000.
[5] K.C. Yow and R. Cipolla, “Feature-based human face detection,” Image and Vision Computing, vol. 25, no.
9, pp. 713–735, Sept. 1997.
[6] D. Maio and D. Maltoni, “Real-time face location on gray-scale static images,” Pattern Recognition, vol. 33,
no. 9, pp. 1525–1539, Sept. 2000.
[7] M.S. Lew and N. Huijsmans, “Information theory and face detection,” Proc.IEEE Int’l Conf. Pattern
Recognition, pp. 601- 605, Aug. 1996.
[8] S.C. Dass and A.K. Jain, “Markov face models,” Proc. IEEE Int’l Conf.Computer Vision, pp. 680–687, July
2001.
[9] A.J. Colmenarez and T.S. Huang, “Face detection with information based maximum discrimination,” Proc.
IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 782–787, June 1997.
[10] D. DeCarlo and D. Metaxas, “Optical flow constraints on deformable models with applications to face
tracking,” International Journal Computer Vision, vol. 38, no. 2, pp. 99–127, July 2000.
[11] V. Bakic and G. Stockman, “Menu selection by facial aspect,” Proc. Vision Interface, Canada, pp. 203–209,
May 1999.
[12] A. Colmenarez, B. Frey, and T. Huang, “Detection and tracking of faces and facial features,” Proc. IEEE
Int’l Conf. Image Processing, pp. 657–661, Oct. 1999.
[13] R. F´eraud, O.J. Bernier, J.-E. Viallet, and M. Collobert, “A fast and accurate face detection based on neural
network,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23.
[14] W.Zhao, R.Chellappa, P.J..Phillips and A. Rosennfeld, “Face Reconi-tion: A literature Survey”. ACM
Comput.Surv., 35(4): 399-458, 2003.
[15] M.A.Turk and A.P. Pentaland, “Face Recognition Using Eigenfaces”, IEEE conf. on Computer Vision and
Pattern Recognition, pp. 586-591, 1991.
[16] Ling-Zhi Liao, Si-Wei Luo, and Mei Tian “”Whitenedfaces” Recogni-tion With PCA and ICA” IEEE Signal
Processing Letters, Vol. 14, No. 12, pp1008-1011, Dec. 2007.
[17] G. Jarillo, W.Pedrycz , M. Reformat “Aggregation ofclassifiers based on image transformations in biometric
face recognition” Machine Vision and Applications (2008) Vol . 19,pp. 125-140, Springer-Verlag 2007.
[18] Tej Pal Singh, “Face Recognition by using Feed Forward Back Propagation Neural Network”, International
Journal of Innovative Research in Technology & Science, vol.1, no.1.
62. 62
[19] N.Revathy, T.Guhan, “Face recognition system using back propagation artificial neural networks”,
International Journal of Advanced Engineering Technology, vol.3, no. 1, 2012.
[20] Kwan-Ho Lin, Kin-Man Lam, and Wan-Chi Siu. “A New Approach using ModiGied Hausdorff Distances
with EigenFace for Human Face Recognition” IEEE Seventh international Conference on Control, Automation,
Robotics and Vision , Singapore, 2-5 Dec, ,pp 980-984, 2002
[21] Zdravko Liposcak, , Sven Loncaric, “Face Recognition From Profiles Using Morphological Operations”,
IEEE Conference on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, 1999.
[22] Simone Ceolin , William A.P Smith, Edwin Hancock, “Facial Shape Spaces from Surface Normals and
Geodesic Distance”, Digital Image Computing Techniques and Applications, 9th Biennial Conference of
Australian Pattern Recognition Society IEEE 3-5 Dec., Glenelg, pp-416-423, 2007
[23] Ki-Chung Chung , Seok Cheol Kee ,and Sang Ryong Kim, “Face Recognition using Principal Component
Analysis of Gabor Filter Responses” ,Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time
Systems, 1999,Proceedings. International Workshop IEEE 26-27 September, Corfu,pp-53-57, 1999,
[24] Ms.Varsha Gupta, Mr. Dipesh Sharma, “A Study of Various Face Detection Methods”,InternationalJournal
of Advanced Research in Computer and Communication Engineering), vol.3, no. 5, May 2014.
[25] Irene Kotsia, Iaonnis Pitas, “Facial expression recognition in image sequences using geometric deformation
features and support vector machines”, IEEE transaction paper on image processing, vol. 16, no.1, pp-172-187,
January 2007.
[26] R.Rojas,”The back propagation algorithm”,Springer-Verlag, Neural networks, pp 149-182 ,1996.
[27] Hosseien Lari-Najaffi , Mohammad Naseerudin and Tariq Samad,”Effects of initial weights on back
propagation and its variations”, Systems, Man and Cybernetics ,Conference Proceedings, IEEE International
Conference, 14-17 Nov ,Cambridge, pp-218-219,1989.
[28] M.H Ahmad Fadzil. , H.Abu Bakar., “Human face recognition using neural networks”, Image processing,
1994, Proceedings ICIP-94, IEEE International Conference ,13-16 November, Austin ,pp-936-939,1994.
[29] N.K Sinha , M.M Gupta and D.H Rao, “Dynamic Neural Networks -an overview”, Industrial Technology
2000,Proceedings of IEEE International Conference,19-22 Jan, , pp-491-496, 2000
[30] Prachi Agarwal, Naveen Prakash, “An Efficient Back Propagation NeuralNetwork Based Face Recognition
System Using Haar Wavelet Transform and PCA” International Journal of Computer Science and Mobile
Computing, vol.2, no.5,pg.386 – 395,May 2013.
[31]Dibber, 4 Jan 2005, “Backpropagation”, https://en.wikipedia.org/wiki/Backpropagation, 20 September 2015
[32] “Artificial Neural Networks”, https://en.wikipedia.org/wiki/Artificial_neural_network, accessed online 2
October 2001.
[33]MichaelNielsen,Jan, “Neural networks and deep learning”,
http://neuralnetworksanddeeplearning.com/chap2.html,2016
[34]Tyrell turing, 7 April, “Feed forward neural network”,
https://en.wikipedia.org/wiki/Feedforward_neural_network,2005.
63. 63
Publications:
Journal Publications
Smriti Tikoo, Nitin Malik, “Detection of face using Viola Jones algorithm and
recognition using Backpropagation neural network”, International Journal of
Computer science and Mobile Computing, vol 5, issue no. 5, pg.288-295, May 2016.
Communicated: “Detection, segmentation and recognition of face and its features using
neural networks”, International Journal of Advanced Research in Computer and
Communication Engineering.