SIFT-based Arabic Sign Language Recognition (ArSL) System
By
Alaa Tharwat1,3
And
Tarek Gaber2,3
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University
Scientific Research Group in Egypt
Introduction: Why ArSL
Introduction: Aim of the work
What is ArSL?
Translating ArSL to spoken language, i.e. translate hand gestures to Arabic characters
Sign Language hand formations:
Hand shape
Hand location
Hand movement
Hand orientation
Introduction: Types of ArSL
Proposed Method: General Framework
Proposed Method: General Framework
Training phase
Collecting all training images (i.e. gestures of Arabic Sign Language).
Extracting the features using SIFT
Representing each image by one feature vector.
Applying a dimensionality reduction (e.g, LDA) to reduce the number features in the vector
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Feature Extraction
Proposed Method: Classification Techniques
We have used the following classifiers assess their performance with our approach :
SVM is one of the classifers which deals with a problem of high dimensional datasets and gives very good results.
K-NN: unknown patterns are distinguished based on the similarity to known samples
Nearest Neighbor: Its idea is extremely simple as it does not require learning
Experimental Results: Dataset
We have used 210 gray level images with size 200x200.
These images represent 30 Arabic characters, 7 images for each character).
The images are collected in different illumination, rotation, quality levels, and image partiality.
Experimental Scenarios
To select the most suitable parameters.
To understand the effect of changing the number of training images.
To prove that our proposed method is robust against rotation
To prove that our proposed method is robust against occlusion.
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Conclusions
Our proposal approach for ArSL Recognition
Achieve an excellent accuracy to identify ArSL from 2D images
Robust against to rotation images with different angels and occluded images horizontally or vertically.
Robust against many previous ArSL approaches.
Performance of this approach is measured by
Using captured images with Matlab implementation
Comparison with related work
Future Work
Improving the results of in case of image occlusion
Increase the size of the dataset to check its scalability.
Identify characters from video frames and then try to implement real time ArSL system.
Thanks
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ababa, ethiopia- tarek-gaber
1. SIFT-BASED ARABIC SIGN LANGUAGE
RECOGNITION (ArSL) SYSTEM
By
Alaa Tharwat1,3
And
Tarek Gaber2,3
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University
AECIA 2014 –November17-19, Addis Ababa, Ethiopia
4. Introduction: Why ArSL
• Help vocally disabled people
to speak freely.
• Easy way of communication
with non-mute people.
• ArSL is the natural language
for deaf like spoken language
to vocal
5. Introduction: Aim of the work
Design a sign language recognition approach to transcribe
sign gestures into meaningful text or speech so that
communication between deaf and hearing society can
easily be made.
الشارع السيارة
6. What is ArSL?
Translating ArSL to spoken language, i.e. translate
hand gestures to Arabic characters
Sign Language hand formations:
Hand shape
Hand location
Hand movement
Hand orientation
7. Introduction: Types of ArSL
1- Vision-based Approach
Requires special set up for camera, but needs some
preprocessing and computational to extract features.
Extract Features
Collect gestures
Classification
Decision
8. Introduction: Types of ARSL (Continue)
2-ElectronicGlove-based Approach
Inconvenience of gloves, but ease of signal extractions
The Electronic-gloves consists of:
• 22 sensors
• Light weight
• Flexible
10. Proposed Method: General Framework
Training Images Testing Images
SIFT Feature Extraction Method
Feature Vectors
Difference
of
Gaussian
Pyramid
KeyPoints
detection
Unreliable
KeyPoints
Eliminatio
n
Orientatio
n
Assignme
nt
Descriptor
Computatio
n
Feature Vectors
Matching LDA
11. Proposed Method: General Framework
Training phase
Collecting all training images (i.e.
gestures of Arabic Sign Language).
Extracting the features using SIFT
Representing each image by one feature
vector.
Applying a dimensionality reduction
(e.g, LDA) to reduce the number
features in the vector
Testing phase
Collecting the testing image,
Extract the features
Feature vector is projected on LDA
space.
Applying machine learning techniques
for classifying the test feature vector to
decide whether the animal is identified
or not).
12. Proposed Method: Feature Extraction
Feature Extraction SIFT (Scale Invariant Feature Transform
SIFT feature extraction algorithm
consists of the following steps:
• Creating the Difference of Gaussian Pyramid
(Scale-Space Peak Selection)
• Extrema Detection
• Unreliable Keypoints Elimination
• Orientation Assignment
• Descriptor Computation Keypoints or Extrema
extracted from one image
(gesture) using SIFT
algorithm
13. Proposed Method: Feature Extraction
Feature Extraction SIFT (Scale Invariant Feature Transform
Matching between two getures based on SIFT features
14. Proposed Method: Feature Extraction
Feature Extraction SIFT (Scale Invariant Feature Transform
The Number of features extracted by SIFT depends its
parameters which has been considered in our experiment:
• Peak Threshold (PeakThr)
• patch size (Psize)
• number of angels (Nangels) and number of bins (Nbins)
15. Proposed Method: Classification Techniques
We have used the following classifiers assess their performance
with our approach :
SVM is one of the classifers which deals with a problem of high dimensional
datasets and gives very good results.
K-NN: unknown patterns are distinguished based on the similarity to known
samples
Nearest Neighbor: Its idea is extremely simple as it does not require learning
17. Experimental Results: Dataset
We have used 210 gray level images
with size 200x200.
These images represent 30 Arabic
characters, 7 images for each
character).
The images are collected in different
illumination, rotation, quality levels,
and image partiality.
A sample of collected ArSL gestures
representing different characters .
18. Experimental Scenarios
We have designed three experiment Scenarios:
To select the most suitable parameters.
To understand the effect of changing the
number of training images.
To prove that our proposed method is
robust against rotation
To prove that our proposed method is
robust against occlusion.
20. Experimental Results
Experimental Results – 2nd Scenario: Different Training No. of images
Classier No. of Training Images
5 3 1
Min. Dist. 100 99.2 98.9
k-NN (k=5) 100 98.9 98.9
SVM 100 99. 98.9
Accuracy results (in %) of our approach using different training
images
21. Experimental Results
Experimental Results – 3rd Scenario: Rotated images
F.E.M. Matching Angles of rotation (o)
0 45 90 135 180 225 270 315
Min Dist. 100 98.9 97.8 96.7 100 97.8 100 98.9
SIFT
k-NN_5 100 100 100 96.7 100 98.9 100 100
SVM 100 100 98.9 98.9 100 98.9 100 100
Accuracy in (%) of our approach when rotated images are used
23. Experimental Results
A comparison between proposed system and previous systems.
Author Accuracy in (%)
K. Assaleh et al. [1] 93.5
Al-Jarrah et al. [6] 94.4
Al-Jarrah et al. [9] 97.5
Mohandes et al. [12] 87
Our proposed 99
Accuracy results (in %) of our approach using different training
images
25. Conclusions
Our proposal approach for ArSL Recognition
Achieve an excellent accuracy to identify ArSL from 2D images
Robust against to rotation images with different angels and occluded
images horizontally or vertically.
Robust against many previous ArSL approaches.
Performance of this approach is measured by
Using captured images with Matlab implementation
Comparison with related work
26. Future Work
Improving the results of in case of image
occlusion
Increase the size of the dataset to check its
scalability.
Identify characters from video frames and then
try to implement real time ArSL system.
27. Thanks Acknowledgement to By the respected co-authers
Abul Ella Hassenian3,4, M. K. Shahin 1, Basma Refaat 1
1Faculty of Eng. Suez Canal University, Ismailia, Egypt
2Faculty of Computers & Informatics , Suez Canal University, Ismailia, Egypt
3Faculty of Computers and Information, Cairo University, Egypt
4Scientic Research Group in Egypt (SRGE), http://www.egyptscience.netSuez Canal University