Reducing Language Barriers for Tourists Using Handwriting Recognition Enabled Mobile Application
1. Reducing Language Barriers for Tourists
Using Handwriting Recognition Enabled
Mobile Application
• Edgard Chammas, Chafic Rami Al Hajj Mohamad
Mokbel
American University of the
• University of Balamand Middle East
• Balamand, Lebanon Egaila, Kuwait
Cristina Oprean, Laurence Likforman Sulem, Gérard Chollet
TELECOM ParisTech
Paris, France
ACTEA 2012
2. INTRODUCTION
• The goal is to reduce the language barriers for tourists
who do not speak the Arabic language
– Take a photo of the text using your smart phone and get the
corresponding information in your own language
• Handwriting Recognition enabled mobile application
has been developed for this purpose
– Robust recognition engine to recognize both handwritten and
printed noisy texts
– Two specific vocabularies:
• Villages and Town Names
• Restaurant Menu entries
3. HANDWRITING RECOGNITION SYSTEM
Based on Hidden Markov Model (HMM)
Feature vectors extracted from the input image
by a sliding windows characterized by its size,
overlap and sliding direction
20 parameters and their derivatives are computed
by automatically dividing the window into 21 cells
and defining the baseline
The sequence of feature vectors is modeled by
HMMs
A HMM model is dedicated to each variant of a letter
4. HANDWRITING RECOGNITION SYSTEM
Training of the HMMs is done with the Expectation-
Maximization (EM) algorithm that iteratively estimates
their parameters guaranteeing a non-decrease of the
likelihood function with each iteration
The Viterbi algorithm yields to the identification of the
written text by determining the most likely sequence.
5. RECOGNITION OF PRINTED TEXTS
USING THE HANDWRITING
RECOGNITION SYSTEM
Problem 1: Need for an efficient recognition system to
recognize both handwritten and printed texts
Hypothesis to be tested: If sufficient data covering an
extended set of variability in handwritten texts is used to
train the handwriting recognition system, then this
system would be useful in recognizing both handwritten
and printed texts
The printed text can be considered as the median
form of the corresponding handwritten text
The handwritten texts are variations of the
corresponding printed texts
6. RECOGNITION OF PRINTED TEXTS
USING THE HANDWRITING
RECOGNITION SYSTEM
• Problem 2: The system should be easily
adaptable to any vocabulary
– No additional data need to be collected and a
retraining performed when changing the vocabulary
• Solution to be tested: Word models are the
concatenation of letters’ models
– Any new word model can be automatically built by
concatenating its letters’ models
7. EXPERIMENTS AND RESULTS
Balamand HCM toolkit used for both training and
recognition using HMMs
State of the art UOB-ENST system trained on the IFN-
ENIT database of Tunisian village names
Vocabulary set of 946 names, written by more than 400
writers. 26000 images are used for training
The current performance of the system on handwritten
words is 90.9%
Sample image from the training IFN/ENIT
8. EXPERIMENTS AND RESULTS
Need to evaluate performance on printed texts.
Target vocabulary: Lebanese village names and restaurant
menu entries.
The HMM models of words are obtained by concatenating
the character HMMs trained on the IFN/ENIT database
No printed texts have been included in the training sets
Small test database has been collected:
40 menu entries corresponding to Lebanese
specialties
40 Lebanese towns and villages
9. EXPERIMENTS AND RESULTS
A test set of 240 images is constructed by
typing in 3 different fonts these menu entries,
town and village names
Sample images from the locally collected test databases
10. EXPERIMENTS AND RESULTS
Only 2 errors for 120 menu entries images
tested and 3 errors for the town and village
names images
2% error rate
If we consider the top 5 solutions, then this
system has no error on this printed database
even when dealing with a completely different
vocabulary
13. CONCLUSIONS
A novel approach combining both handwritten and
printed texts recognition in a mobile application
HMM-based handwritten recognition system achieved
only 2% error rate on typed texts
High accuracy also maintained with handwritten words
belonging to a vocabulary different from the one used
in training the HMM models
N-best solutions returned by the server to the user
makes the application robust to recognition errors
14. FUTURE WORK
The simultaneous recognition of handwritten and
printed texts will be further studied
The effect on the performance of the introduction of
printed texts in the training set will be measured and
interpreted
The geographical positioning provided by the device
could be sent to the server permitting to guide the
recognition by limiting the vocabulary in use