SlideShare a Scribd company logo
1 of 10
Online Handwritten Script Recognition

(Synopsis)
Abstract
Automatic identification of handwritten script facilitates many
important applications such as automatic transcription of multilingual
documents and search for documents on the Web containing a
particular

script. The increase in usage of handheld devices which

accept handwritten input has created a growing demand for algorithms
that can efficiently analyze and retrieve handwritten data. This project
proposes a method to classify words and lines in an online handwritten
document into one of the six major scripts: Arabic, Cyrillic, Devnagari,
Han, Hebrew, or Roman. The classification is based on 11 different
spatial and temporal features extracted from the strokes of the words.
The proposed system attains an overall classification accuracy of 87.1
percent at the word level with 5-fold cross validation on a data set
containing 13,379 words. The classification accuracy improves to
95 percent as the number of words in the test sample is increased to
five, and to 95.5 percent for complete text lines consisting of an
average of seven words.
Description of Problem
The field of work on automatic script recognition deals with
offline documents, i.e., documents which are either handwritten or
printed on a paper and then scanned to obtain a two-dimensional
digital representation. The existing method deals with script
recognition in off-line. Today we have lot of electronic devices and lot
of input devices are available such as digital pen, digital tablet, stylus
through which user can input any type of script in side computer. Such
devices, which generate handwritten documents with online or
dynamic (temporal) information, require efficient algorithms for
processing and retrieving handwritten data. Online documents may be
written in different languages and scripts. A single document page in
itself may contain text written in multiple scripts. So the project aims
to develop efficient algorithm to recognize the script.

Existing Method
The existing method deals with languages are identified
•

Using projection profiles of words and character shapes.

•

Using horizontal projection profiles and looking for the
presence or absence of specific shapes in different scripts.

•

Existing method deals with only few characteristics
•

Most of the method does this in off-line

Proposed System
The proposed method uses the features of connected
components to classify six different scripts (Arabic, Chinese,
Cyrillic, Devnagari, Japanese, and Roman) and reported a
classification accuracy of 88 percent on document pages. There
are a few important aspects of online documents that enable us
to process them in a fundamentally different way than offline
documents. The most important characteristic of online
documents is that they capture the temporal sequence of strokes
while writing the document. This allows us to analyze the
individual strokes and use the additional temporal information
for both script identification as well as text recognition.
In the case of online documents, segmentation of
foreground from the background is a relatively simple task as
the captured data, i.e., the (x; y) coordinates of the locus of the
stylus, defines the characters and any other point on the page
belongs to the background. We use stroke properties as well as
the spatial and temporal information of a collection of strokes to
identify the script used in the document. Unfortunately, the
temporal information also introduces additional variability to the
handwritten characters, which creates large intraclass variations
of strokes in each of the script classes.
INTRODUCTION
Now-a-days the increase in popularity of portable computing
devices
such as PDAs and handheld computers , nonkeyboardbased methods
for data entry are receiving more attention in the research
communities and commercial sector. The most promising options are
pen-based and voice-based inputs. Digitizing devices like SmartBoards
and computing platforms such as the IBM Thinkpad TransNote and
Tablet PCs , have a pen-based user interface. Such devices, which
generate handwritten documents with online or dynamic (temporal)
information, require efficient algorithms for processing and retrieving
handwritten data. Online documents may be written in different
languages and scripts. A single document page in itself may contain
text written in multiple scripts. For example, a document in English
may have some annotations or edits in another language. Most of the
text recognition algorithms are designed to work with a particular
script and treat any input text as being written only in the script
under consideration. Therefore, an online document analyzer must first
identify the script before employing a particular algorithm for text
recognition.
Note that a typical multilingual, online document contains two or
three scripts . The document page shown in Fig. 1 was created by us
for illustrating the results.
Note that the writing style in English is mixed, with both cursive and
handprint words. For lexicon-based recognizers, it may be helpful to
identify the specific language of the text if the same script is used by
multiple languages.
A script is defined as a graphic form of a writing system.
Different scripts may follow the same writing system. For example, the
alphabetic system is adopted by scripts like Roman and Greek, and the
phonetic-alphabetic system is adopted by most Indian scripts,
including Devnagari. A specific script like Roman may be used by
multiple languages such as English, German, and French. Detailed
studies of the history and evolution of scripts can be found in Jensen
and Lo. The six scripts considered in this work, named Arabic, Cyrillic,
Devnagari, Han, Hebrew, and Roman, cover the languages
used by a majority of the world population. The general class of Hanbased scripts include Chinese, Japanese, and Korean (we do not
consider Kana or Han-Gul). Devnagari script is used by many Indian
languages, including Hindi, Sanskrit, Marathi, and Rajasthani. Arabic
script is used by Arabic, Farsi, Urdu, etc. Roman script is used by
many European languages like English, German, French, and Italian.
We attempt to solve the problem of script recognition, where either an
entire document or a part of a
document (up to word level) is classified into one of the six scripts
mentioned above, with the aim of facilitating text recognition or
retrieval. The problem of identifying the actual language often involves
recognizing the text and identifying specific words or
sequence of characters, which is beyond the scope of this paper.Most
of the published work on automatic script recognition deals with offline
documents, i.e., documents which are either handwritten or printed on
a paper and then scanned to obtain a two-dimensional digital
representation. For printed documents, Hochberg et al. used clusterbased emplates for discriminating 13 different scripts. Spitz proposed
a language identification
scheme where the words of 26 different languages are first classified
into Han-based and Latin-based scripts. The actual languages are
identified using projection profiles of words and character shapes. Jain
and Zhong used Gabor filter-based texture features to segment a page
into regions containing Han and Roman scripts. Pal and Chaudhuri
have eveloped a system for identifying Indian scripts using horizontal
projection profiles and looking for the presence or absence of specific
shapes in different scripts. Other approaches to script and language
identification in printed documents.

System Requirement
Hardware specifications:
Processor
RAM

:
:

Intel Processor IV
128 MB

Hard disk

:

20 GB

CD drive

:

40 x Samsung

Floppy drive

:

1.44 MB

Monitor

:

15’ Samtron color

Keyboard
Mouse

:
:

108 mercury keyboard
Logitech mouse

Software Specification
Operating System – Windows XP/2000
Language used – J2sdk1.4.0
shapes in different scripts. Other approaches to script and language
identification in printed documents.

System Requirement
Hardware specifications:
Processor
RAM

:
:

Intel Processor IV
128 MB

Hard disk

:

20 GB

CD drive

:

40 x Samsung

Floppy drive

:

1.44 MB

Monitor

:

15’ Samtron color

Keyboard
Mouse

:
:

108 mercury keyboard
Logitech mouse

Software Specification
Operating System – Windows XP/2000
Language used – J2sdk1.4.0

More Related Content

What's hot

Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Sebastiano Panichella
 
An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptIJERA Editor
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionNaiyan Noor
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition inijaia
 
Pre-Defense CSE Thesis Presentation in BAUST
Pre-Defense CSE Thesis Presentation in BAUSTPre-Defense CSE Thesis Presentation in BAUST
Pre-Defense CSE Thesis Presentation in BAUSTNaiyan Noor
 
Java Abs Online Handwritten Script Recognition
Java Abs   Online Handwritten Script RecognitionJava Abs   Online Handwritten Script Recognition
Java Abs Online Handwritten Script Recognitionncct
 
OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)Neeraj Neupane
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
 
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...ijaia
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 

What's hot (15)

Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?Using IR methods for labeling source code artifacts: Is it worthwhile?
Using IR methods for labeling source code artifacts: Is it worthwhile?
 
An Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari ScriptAn Optical Character Recognition for Handwritten Devanagari Script
An Optical Character Recognition for Handwritten Devanagari Script
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer Version
 
Handwritten character recognition in
Handwritten character recognition inHandwritten character recognition in
Handwritten character recognition in
 
Handwritten Character Recognition
Handwritten Character RecognitionHandwritten Character Recognition
Handwritten Character Recognition
 
DeepPavlov 2019
DeepPavlov 2019DeepPavlov 2019
DeepPavlov 2019
 
Pre-Defense CSE Thesis Presentation in BAUST
Pre-Defense CSE Thesis Presentation in BAUSTPre-Defense CSE Thesis Presentation in BAUST
Pre-Defense CSE Thesis Presentation in BAUST
 
Java Abs Online Handwritten Script Recognition
Java Abs   Online Handwritten Script RecognitionJava Abs   Online Handwritten Script Recognition
Java Abs Online Handwritten Script Recognition
 
OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
AN EFFECTIVE ARABIC TEXT CLASSIFICATION APPROACH BASED ON KERNEL NAIVE BAYES ...
 
ATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliterationATAR: Attention-based LSTM for Arabizi transliteration
ATAR: Attention-based LSTM for Arabizi transliteration
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 
co:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahnco:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahn
 

Similar to Online handwritten script recognition (synopsis)

SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGEScscpconf
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesiosrjce
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...ijnlc
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documentsijait
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSijait
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALcsandit
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...CSCJournals
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2IAEME Publication
 
Language Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianLanguage Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianWaqas Tariq
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationCSCJournals
 
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov ModelsContextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov Modelsijnlc
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
Script identification from printed document images using statistical
Script identification from printed document images using statisticalScript identification from printed document images using statistical
Script identification from printed document images using statisticalIAEME Publication
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCRdbpublications
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For QuechuaAndrea Porter
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationIJERD Editor
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...IJMER
 

Similar to Online handwritten script recognition (synopsis) (20)

SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGESSCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
SCRIPTS AND NUMERALS IDENTIFICATION FROM PRINTED MULTILINGUAL DOCUMENT IMAGES
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document images
 
P01725105109
P01725105109P01725105109
P01725105109
 
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
WRITER RECOGNITION FOR SOUTH INDIAN LANGUAGES USING STATISTICAL FEATURE EXTRA...
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documents
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
 
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVALDICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
DICTIONARY BASED AMHARIC-ARABIC CROSS LANGUAGE INFORMATION RETRIEVAL
 
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2
 
Language Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and PersianLanguage Identifier for Languages of Pakistan Including Arabic and Persian
Language Identifier for Languages of Pakistan Including Arabic and Persian
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
Hf3413291335
Hf3413291335Hf3413291335
Hf3413291335
 
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov ModelsContextual Analysis for Middle Eastern Languages with Hidden Markov Models
Contextual Analysis for Middle Eastern Languages with Hidden Markov Models
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
Script identification from printed document images using statistical
Script identification from printed document images using statisticalScript identification from printed document images using statistical
Script identification from printed document images using statistical
 
Spell checker for Kannada OCR
Spell checker for Kannada OCRSpell checker for Kannada OCR
Spell checker for Kannada OCR
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...An Empirical Study on Identification of Strokes and their Significance in Scr...
An Empirical Study on Identification of Strokes and their Significance in Scr...
 

More from Mumbai Academisc

More from Mumbai Academisc (20)

Non ieee java projects list
Non  ieee java projects list Non  ieee java projects list
Non ieee java projects list
 
Non ieee dot net projects list
Non  ieee dot net projects list Non  ieee dot net projects list
Non ieee dot net projects list
 
Ieee java projects list
Ieee java projects list Ieee java projects list
Ieee java projects list
 
Ieee 2014 java projects list
Ieee 2014 java projects list Ieee 2014 java projects list
Ieee 2014 java projects list
 
Ieee 2014 dot net projects list
Ieee 2014 dot net projects list Ieee 2014 dot net projects list
Ieee 2014 dot net projects list
 
Ieee 2013 java projects list
Ieee 2013 java projects list Ieee 2013 java projects list
Ieee 2013 java projects list
 
Ieee 2013 dot net projects list
Ieee 2013 dot net projects listIeee 2013 dot net projects list
Ieee 2013 dot net projects list
 
Ieee 2012 dot net projects list
Ieee 2012 dot net projects listIeee 2012 dot net projects list
Ieee 2012 dot net projects list
 
Spring ppt
Spring pptSpring ppt
Spring ppt
 
Ejb notes
Ejb notesEjb notes
Ejb notes
 
Java web programming
Java web programmingJava web programming
Java web programming
 
Java programming-examples
Java programming-examplesJava programming-examples
Java programming-examples
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 
Web based development
Web based developmentWeb based development
Web based development
 
Jdbc
JdbcJdbc
Jdbc
 
Java tutorial part 4
Java tutorial part 4Java tutorial part 4
Java tutorial part 4
 
Java tutorial part 3
Java tutorial part 3Java tutorial part 3
Java tutorial part 3
 
Java tutorial part 2
Java tutorial part 2Java tutorial part 2
Java tutorial part 2
 
Engineering
EngineeringEngineering
Engineering
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Online handwritten script recognition (synopsis)

  • 1. Online Handwritten Script Recognition (Synopsis)
  • 2. Abstract Automatic identification of handwritten script facilitates many important applications such as automatic transcription of multilingual documents and search for documents on the Web containing a particular script. The increase in usage of handheld devices which accept handwritten input has created a growing demand for algorithms that can efficiently analyze and retrieve handwritten data. This project proposes a method to classify words and lines in an online handwritten document into one of the six major scripts: Arabic, Cyrillic, Devnagari, Han, Hebrew, or Roman. The classification is based on 11 different spatial and temporal features extracted from the strokes of the words. The proposed system attains an overall classification accuracy of 87.1 percent at the word level with 5-fold cross validation on a data set containing 13,379 words. The classification accuracy improves to 95 percent as the number of words in the test sample is increased to five, and to 95.5 percent for complete text lines consisting of an average of seven words.
  • 3. Description of Problem The field of work on automatic script recognition deals with offline documents, i.e., documents which are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital representation. The existing method deals with script recognition in off-line. Today we have lot of electronic devices and lot of input devices are available such as digital pen, digital tablet, stylus through which user can input any type of script in side computer. Such devices, which generate handwritten documents with online or dynamic (temporal) information, require efficient algorithms for processing and retrieving handwritten data. Online documents may be written in different languages and scripts. A single document page in itself may contain text written in multiple scripts. So the project aims to develop efficient algorithm to recognize the script. Existing Method The existing method deals with languages are identified • Using projection profiles of words and character shapes. • Using horizontal projection profiles and looking for the presence or absence of specific shapes in different scripts. • Existing method deals with only few characteristics
  • 4. • Most of the method does this in off-line Proposed System The proposed method uses the features of connected components to classify six different scripts (Arabic, Chinese, Cyrillic, Devnagari, Japanese, and Roman) and reported a classification accuracy of 88 percent on document pages. There are a few important aspects of online documents that enable us to process them in a fundamentally different way than offline documents. The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document. This allows us to analyze the individual strokes and use the additional temporal information for both script identification as well as text recognition. In the case of online documents, segmentation of foreground from the background is a relatively simple task as the captured data, i.e., the (x; y) coordinates of the locus of the stylus, defines the characters and any other point on the page belongs to the background. We use stroke properties as well as the spatial and temporal information of a collection of strokes to identify the script used in the document. Unfortunately, the temporal information also introduces additional variability to the
  • 5. handwritten characters, which creates large intraclass variations of strokes in each of the script classes.
  • 6. INTRODUCTION Now-a-days the increase in popularity of portable computing devices such as PDAs and handheld computers , nonkeyboardbased methods for data entry are receiving more attention in the research communities and commercial sector. The most promising options are pen-based and voice-based inputs. Digitizing devices like SmartBoards and computing platforms such as the IBM Thinkpad TransNote and Tablet PCs , have a pen-based user interface. Such devices, which generate handwritten documents with online or dynamic (temporal) information, require efficient algorithms for processing and retrieving handwritten data. Online documents may be written in different languages and scripts. A single document page in itself may contain text written in multiple scripts. For example, a document in English may have some annotations or edits in another language. Most of the text recognition algorithms are designed to work with a particular script and treat any input text as being written only in the script under consideration. Therefore, an online document analyzer must first identify the script before employing a particular algorithm for text recognition.
  • 7. Note that a typical multilingual, online document contains two or three scripts . The document page shown in Fig. 1 was created by us for illustrating the results. Note that the writing style in English is mixed, with both cursive and handprint words. For lexicon-based recognizers, it may be helpful to identify the specific language of the text if the same script is used by multiple languages. A script is defined as a graphic form of a writing system. Different scripts may follow the same writing system. For example, the alphabetic system is adopted by scripts like Roman and Greek, and the phonetic-alphabetic system is adopted by most Indian scripts, including Devnagari. A specific script like Roman may be used by multiple languages such as English, German, and French. Detailed studies of the history and evolution of scripts can be found in Jensen and Lo. The six scripts considered in this work, named Arabic, Cyrillic, Devnagari, Han, Hebrew, and Roman, cover the languages used by a majority of the world population. The general class of Hanbased scripts include Chinese, Japanese, and Korean (we do not consider Kana or Han-Gul). Devnagari script is used by many Indian languages, including Hindi, Sanskrit, Marathi, and Rajasthani. Arabic script is used by Arabic, Farsi, Urdu, etc. Roman script is used by
  • 8. many European languages like English, German, French, and Italian. We attempt to solve the problem of script recognition, where either an entire document or a part of a document (up to word level) is classified into one of the six scripts mentioned above, with the aim of facilitating text recognition or retrieval. The problem of identifying the actual language often involves recognizing the text and identifying specific words or sequence of characters, which is beyond the scope of this paper.Most of the published work on automatic script recognition deals with offline documents, i.e., documents which are either handwritten or printed on a paper and then scanned to obtain a two-dimensional digital representation. For printed documents, Hochberg et al. used clusterbased emplates for discriminating 13 different scripts. Spitz proposed a language identification scheme where the words of 26 different languages are first classified into Han-based and Latin-based scripts. The actual languages are identified using projection profiles of words and character shapes. Jain and Zhong used Gabor filter-based texture features to segment a page into regions containing Han and Roman scripts. Pal and Chaudhuri have eveloped a system for identifying Indian scripts using horizontal projection profiles and looking for the presence or absence of specific
  • 9. shapes in different scripts. Other approaches to script and language identification in printed documents. System Requirement Hardware specifications: Processor RAM : : Intel Processor IV 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ Samtron color Keyboard Mouse : : 108 mercury keyboard Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – J2sdk1.4.0
  • 10. shapes in different scripts. Other approaches to script and language identification in printed documents. System Requirement Hardware specifications: Processor RAM : : Intel Processor IV 128 MB Hard disk : 20 GB CD drive : 40 x Samsung Floppy drive : 1.44 MB Monitor : 15’ Samtron color Keyboard Mouse : : 108 mercury keyboard Logitech mouse Software Specification Operating System – Windows XP/2000 Language used – J2sdk1.4.0