SlideShare a Scribd company logo
1 of 7
Opticalcharacter recognition
OCR
Shobhit Saxena
Amity University
Saxenashobhit1988@gmail.com
Nidhi Sharma
Amity University
nidhi9392@gmail.com
Abstract—Optical character recognition, usually abbreviated to
OCR, is the mechanical or electronic conversion of scanned images
of handwritten, typewritten or printed text into machine-encoded text.
It is a system that provides a full alphanumeric recognition of printed
or handwritten characters at electronic speed by simply scanning the
form. It is widely used as a form of data entry from some sort of
original paper data source, whether documents, sales receipts, mail,
or any number of printed records.
It is a common method of digitizing printed texts so that they can be
electronically searched, stored more compactly, displayed on-line,
and used in machine processes such as machine translation, text-to-
speech and text mining.OCR is a field of research in pattern
recognition, artificial intelligence and computer vision. More
recently, the term Intelligent Character Recognition(ICR) has been
used to describe the process of interpreting image data, in particular
alphanumeric text
I. INTRODUCTION
The area of Optical Character Recognition (OCR) involves
locating the characters in the image and converting them into text
files. The character in the image cannot be processed as such and
need to be represented in suitable character coding.
With the availability of computers at cheap rates and convenience in
dealing with data in digital form everyone aims at storing data in
digital form. Data in the form of hard copies of the text documents
are stored in digital form by scanning them as image files. These
images do not support the operations based on text such as editing,
summarizing etc, and it is quite tedious task to manually feed this
data into computer systems. This is where OCR comes into play.
II. PROBLEMS AND MOTIVATION
A. Currently, OCR is available a beta product (a product in
experimental stage) and research is still being carried out in this field.
The OCR employs a part of Artificial Intelligence, which again is an
under research topic.
The main problem of OCR system is to correctly interpret the images
of characters. This procedure makes use of Pattern classification
algorithms. There are several algorithms available and various others
are being formulated which can be chosen to be used in the OCR
implementation.
III. APPROACH AND GOAL OF THIS PROJECT
The goal of this project is to implement OCR using LVQ
(Learning Vector Quantization) algorithm. This project uses
supervised learning approach of Pattern Classification. First the
image is studied so as to detect the possibility of presence of a
character, and if such image is found, it is associated with suitable
character code. The main steps involved in OCR are:
A. Pre-processing
The digitized images are usually in gray tone, and for a clear
document, a simple histogram based threshold approach is sufficient
for converting them to two three tone images. The histogram of gray
values of the pixels shows two prominent peaks, and a middle gray
value located between the peaks is a good choice for threshold.
For salt and pepper noise we generally use median filter. Median
filter replaces the value of a pixel by the median of gray levels in the
neighborhood of that pixel (the original value of the pixel is included
in the computation of the median), Median filters provide excellent
noise reduction capabilities, with considering less blurring than linear
smoothing filters of similar size.
(The image illustrated below is only an example)
Fig 1.1 Image with salt and pepper noise
Fig1.2 Image without salt and pepper noise
B. Segmentation
Segmentation is one of the most important phases of OCR system. By
applying good segmentation techniques we can increase the
performance of OCR. Segmentation subdivides an image into its
constituent regions or objects. Basically in segmentation, we try to
extract basic constituent of the script, which are certainly characters.
This is needed because our classifier recognizes these characters
only. Segmentation phase is also crucial in contributing to this error
due to touching characters, which the classifier cannot properly
tackle. Even in good quality documents, some adjacent characters
touch each other due to inappropriate scanning resolution.
 Segmentation of Line: Text lines are detected by horizontal
scanning. For segmentation of line, we scan scanned
document page horizontally from the top and find the last
row containing all white pixels, before a black pixel is found.
Then we find the first row containing entire white pixel just
after the end of black pixels. We repeated this process on
entire page to find out all lines.
 Segmentation of Words: After finding a particular line we
separate individual words. This is done by vertical scanning.
 Segmentation of Individual Characters Once we get the
words we segment it to individual characters. Before
segmenting words to individual characters, we locate the
head line. This is done by finding the rows having maximum
number of black pixels in a word. After locating head line we
remove it i.e. Converts it in white pixels. After removing
head line our word is divided into three horizontal parts
known as upper zone, middle zone and lower zone.
Individual characters are separated from each zone by
applying vertical scanning.
Fig 1.3 Output of the segmentation
Classification
 Classification is performed based on the extracted features.
For initial classification of characters, we consider three
features as follows:
 • Mean Distance
 • Histogram of projection based on pixel value
 • Histogram of projection based on spatial position of pixel
Feature Extraction:
Feature extraction is one of the most important steps in developing a
classification system. This step describes the various features selected
by us for classification of the selected characters.
Fig 1.4 Control flow of OCR
C. Where are we today?
The advent of the array method of scanning, coupled with the
higher speeds and more compact computing power, has led to the
concept of "Image Processing". Image processing does not have
to utilize optical recognition to be successful. For example, the
ability to change any document to an electronically digitized item
may effectively replace microfilm devices. This provides the user
a much more convenient method of sorting images compared to
handling actual documents or microfilm pictures. Image
processing relies on larger more complex arrays than early third
generation OCR scanners. A graph within a graph is an
“inset,” not an “insert.” The word alternatively is preferred
to the word “alternately” (unless you really mean
something that alternates).
COMPARISON TABLE OF OCR AND OMR
ITEM OCR OMR
Handprint
recognition
Y N
Machine print
recognition
Y N
Recognition of
checks and
"X"s
Y Y
Requires
timing tracks/
form IDs
N Y
Requires
registration
marks
Y N
Electronic
image storage
and
retrieval
Y N
D. Design objectives:
Design objective includes the key points to be considered in
designing the software. Some of the important design issues to be
dealt with are:
System must be user friendly: A system is no good if it does not
eases the work of its operator and can be used easily. User friendly
systems are easy to use and adapt.
System must make the task comprehensible: A system must have
clear objectives and should be capable of doing exactly what is told
to it.
System must be transparent: The working of the system must be
clear to the user so that it can easily be modified and it is possible to
troubleshoot it.
2. Acquiring input data:
Acquiring input data indicates the various issues in feeding the data
for processing.
Input data can be acquired by two methods:
Scanning the image: This method involves the use of scanners.
Obtaining the pre-scanned image. In this method an image is fed
to the OCR system.
3. Components of OCR:
3.1 Character Tracer:
This component will locate characters in the image.
3.2 Mean Square Error Recognizer:
Recognizes characters of target image, with the help of training
image.
3.3 Handwriting Recognizer:
This feature learns to recognize the handwriting of an individual. It
has following subcomponents:
a) Training image:
This image is used as reference image.
b) Configure:
This component chooses the recognizer process from MSE and
Character aspect ratio analyzer.
c) Process:
This feature processes the target image.
3.4 Validation rules:
Validation rules are used to ensure proper functioning of any system.
Validation rules can be applied on both input as well as output.
3.4Validation rules for image fed to OCR System
Format: Input image should be in proper format, generally the
images stored using raster graphics is easy to process and interpret.
Memory size: The image should be with in proper memory size
limits.
Resolution: Resolution of image determines the quality of the image
and its dimensions. Image should be of appropriate resolution
(400x400).
Availability: Availability means that the image must be present at
the location specified and should exist at the time of processing by
the system.
3.5 Validation rules for the output:
Once the input is fed properly and is processed, the output should be
validated before presenting it to the user. Some of the common
validation rules for the output are:
Character encoding: The characters of the output file should be
encoded using proper method. The most commonly used encoding
technique is Unicode character encoding because it can represent
fonts of nearly every language.
Number of characters generated: An optimum number of
characters must be generated to make the output meaningful. The
output is of no use if the OCR system is not able to recognize
majority of characters.
Format of output file: The output file should be in a format that
suits user. The output file will be checked for errors and consistency
using checksums that are calculated at various stages of processing.
Error messages:
Error messages are produced to assist users in operating the system.
Errors are the violation of validation rules or the unexpected behavior
of the system. Error messages should be simple and descriptive and
must give an overview of the problem occurred. Typical error
messages in case of OCR can be:
Image not found: This error message is to be displayed when the
system is unable to locate the image to be processed.
Invalid format: This message can be displayed when the files are
of unknown format or the file header is broken.
Out of memory: This situation arises when the physical memory
is scarce.
3.6 Interfaces:
Interfaces define a way to interact with the system. Characteristics of
a good interface are:
3.7 There are two popular interfaces:
a) Single complete view: In this type of interface the control
switches are displayed in a window all at once.
b) Tabbed view: In tabbed view, only the control switches related to
the selected option are displayed simultaneously. Tabbed view is
simpler and has greater clarity. For our project we will prefer tabbed
view.
IMPLEMENTATION
4.1Character Tracer
4.2Input-file selection
4.3Input-image for character tracer
4.4Character Tracer Output
4.5Handwriting Recognizer Training
4.6Handwriting recognizer input image:
4.7Handwriting Recognizer Configuration
]
5.1 Handwriting recognizer target image
5.2Handwriting Recognizer processing and output
6. Maintenance
The job of the developer continues even after delivering the product,
in the form of maintenance. Maintenance is necessary to ensure the
proper functioning and allow the system to adapt to ever increasing
needs.
6.1 Types of Maintenance:
a) Fixing
This type of maintenance involves the removal of errors. It can
further be divided into following types:
Corrective: The corrective maintenance involves the
identification and removal of defects. The aim is to remove the
errors.
Adaptive: The adaptive maintenance involves the process of
modifying the software so as to adapt the changes in the runtime
execution such as change in OS, hardware and database.
Since this project is build in java it will not need any adaptive
maintenance due to change in OS, only the corresponding jvm needs
to be installed in new host OS.
b) Enhancing:
This type of maintenance involves increasing the software
functionalities as demanded. It has following sub-types:
Perfective: Changes made due to user request, ie when user
demands any specific changes. This may include change in layout,
GUI etc.
Preventative: This involves making the system more maintainable.
In OCR, enhancing may involve increasing the type of fonts that can
be recognized and ways in which the recognized fonts may be
represented.
7.3 Documentation and user’s training.
To make the most of the system, its users have to be made aware of
the ways so as to exploit the system’s functionality as much as
possible. User needs to be trained in following areas:
a) Hardware requirements: The end user is the one who has to
interact with the system on day-to-day basis. So he must be trained
about the hardware issues, so that he can troubleshoot the minor
problems himself and minimize the risk of damage to the system.
b) Average processing time: User must be aware of the average
time required by the system to complete its processing, so that he
waits for appropriate time before instructing the system to do another
job.
c) Proper input methods: Proper input methods are a must for a
system to work efficiently hence it is necessary for the user to
provide input in desired manner.
Result and conclusion
OCR is the acronym for Optical Character Recognition. This
technology allows a machine to automatically recognize characters
through an optical mechanism. Human beings recognize many
objects in this manner our eyes are the "optical mechanism." But
while the brain "sees" the input, the ability to comprehend these
signals varies in each person according to many factors. By
reviewing these variables, we can understand the challenges faced by
the technologist developing an OCR system.
First, if we read a page in a language other than our own, we may
recognize the various characters, but be unable to recognize words.
However, on the same page, we are usually able to interpret
numerical statements - the symbols for numbers are universally used.
This explains why many OCR systems recognize numbers only,
while relatively few understand the full alphanumeric character
range.
Second, there is similarity between many numerical and alphabetical
symbol shapes. For example, while examining a string of characters
combining letters and numbers, there is very little visible difference
between a capital letter "O" and the numeral "0." As humans, we can
re-read the sentence or entire paragraph to help us determine the
accurate meaning. This procedure, however, is much more difficult
for a machine.
Third, we rely on contrast to help us recognize characters. We may
find it very difficult to read text which appears against a very dark
background, or is printed over other words or graphics. Again,
programming a system to interpret only the relevant data and
disregard the rest is a difficult task for OCR engineers.
There are many other problems which challenge the developers of
OCR systems. In this paper, we will review the history,
advancements, abilities and limitations of existing systems. This
analysis should help determine if OCR is the correct application for
your company's needs, and if so, which type of system to implement.
.
References
[1] http://en.wikipedia.org/wiki/Image_scanner
[2] http://en.wikipedia.org/wiki/OCR
[3] S. Mori et.al, “Historical Reviewof OCR ResearchandDevelopment”,
Proceeding IEEE, 80, no 7, pp. 1029-1058, July 1992.
[4] A. Chaudhary, E.A.S. Ahmad, S. Hossain, C. M. Rahman, “OCR of
Bangla Character Using Neural Network: A better Approach”, 2nd
International Conferenceon Electrical Engineering(ICEE 2002),khuln,
Bangladesh.
[5] Utpal Garain and Bidyut B. Chaudhary, “Segmentation of Touching
Characterin Printed Devnagari and Bangla Script Using Fuzzy Multi
factorial Analysis”, IEEE
[6] TransactiononSystem, Manand Cybernetics- Part C: Applications and
Reviews, 32, November 2002. Page(s): 449-459.
[7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA:
University Science, 1989.
Opticalcharacter recognition

More Related Content

What's hot

Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...inventionjournals
 
Signature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiSignature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiDr. Vinayak Bharadi
 
A Review of Optical Character Recognition System for Recognition of Printed Text
A Review of Optical Character Recognition System for Recognition of Printed TextA Review of Optical Character Recognition System for Recognition of Printed Text
A Review of Optical Character Recognition System for Recognition of Printed Textiosrjce
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Automated License Plate Recognition for Toll Booth Application
Automated License Plate Recognition for Toll Booth ApplicationAutomated License Plate Recognition for Toll Booth Application
Automated License Plate Recognition for Toll Booth ApplicationIJERA Editor
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionDurjoy Saha
 
Paper id 25201447
Paper id 25201447Paper id 25201447
Paper id 25201447IJRAT
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text ImageEditor IJCATR
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniquescscpconf
 
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...IOSR Journals
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNAM Publications
 
A Simple Signature Recognition System
A Simple Signature Recognition System A Simple Signature Recognition System
A Simple Signature Recognition System iosrjce
 
Successive Geometric Center Based Dynamic Signature Recognition
Successive Geometric Center Based Dynamic Signature RecognitionSuccessive Geometric Center Based Dynamic Signature Recognition
Successive Geometric Center Based Dynamic Signature RecognitionDr. Vinayak Bharadi
 

What's hot (18)

Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
Mobile Based Application to Scan the Number Plate and To Verify the Owner Det...
 
Signature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertatiSignature recognition using clustering techniques dissertati
Signature recognition using clustering techniques dissertati
 
A Review of Optical Character Recognition System for Recognition of Printed Text
A Review of Optical Character Recognition System for Recognition of Printed TextA Review of Optical Character Recognition System for Recognition of Printed Text
A Review of Optical Character Recognition System for Recognition of Printed Text
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
Offline Signature Verification and Recognition using Neural Network
Offline Signature Verification and Recognition using Neural NetworkOffline Signature Verification and Recognition using Neural Network
Offline Signature Verification and Recognition using Neural Network
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Automated License Plate Recognition for Toll Booth Application
Automated License Plate Recognition for Toll Booth ApplicationAutomated License Plate Recognition for Toll Booth Application
Automated License Plate Recognition for Toll Booth Application
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Paper id 25201447
Paper id 25201447Paper id 25201447
Paper id 25201447
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
 
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...
Digital Pen for Handwritten Digit and Gesture Recognition Using Trajectory Re...
 
OPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNNOPTICAL CHARACTER RECOGNITION USING RBFNN
OPTICAL CHARACTER RECOGNITION USING RBFNN
 
A Simple Signature Recognition System
A Simple Signature Recognition System A Simple Signature Recognition System
A Simple Signature Recognition System
 
Handwritten Character Recognition
Handwritten Character RecognitionHandwritten Character Recognition
Handwritten Character Recognition
 
Z04405149151
Z04405149151Z04405149151
Z04405149151
 
Successive Geometric Center Based Dynamic Signature Recognition
Successive Geometric Center Based Dynamic Signature RecognitionSuccessive Geometric Center Based Dynamic Signature Recognition
Successive Geometric Center Based Dynamic Signature Recognition
 

Viewers also liked

Viewers also liked (7)

El nas aventurer
El nas aventurerEl nas aventurer
El nas aventurer
 
Fp me reporte 1 aplicación.lucy suarez
Fp me reporte 1 aplicación.lucy suarezFp me reporte 1 aplicación.lucy suarez
Fp me reporte 1 aplicación.lucy suarez
 
Index page of lab file
Index page of lab fileIndex page of lab file
Index page of lab file
 
Mindfulness afonso x
Mindfulness afonso xMindfulness afonso x
Mindfulness afonso x
 
Diapositiva spa - para niñas
Diapositiva spa - para niñasDiapositiva spa - para niñas
Diapositiva spa - para niñas
 
Canteen project
Canteen projectCanteen project
Canteen project
 
College management system ppt
College management system pptCollege management system ppt
College management system ppt
 

Similar to Opticalcharacter recognition

Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)IJERA Editor
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Vidyut Singhania
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognitioniosrjce
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET Journal
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...IRJET Journal
 
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptxSamridhGarg
 
Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++IRJET Journal
 
DevanagiriOCR on CELL BROADBAND ENGINE
DevanagiriOCR on CELL BROADBAND ENGINEDevanagiriOCR on CELL BROADBAND ENGINE
DevanagiriOCR on CELL BROADBAND ENGINEPridhvi Kodamasimham
 

Similar to Opticalcharacter recognition (20)

E017322833
E017322833E017322833
E017322833
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
A12REVIEW.pptx
A12REVIEW.pptxA12REVIEW.pptx
A12REVIEW.pptx
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
D017222226
D017222226D017222226
D017222226
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognition
 
I017256165
I017256165I017256165
I017256165
 
Assignment-1-NF.docx
Assignment-1-NF.docxAssignment-1-NF.docx
Assignment-1-NF.docx
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx
19BCS1815_PresentationAutomatic Number Plate Recognition(ANPR)P.pptx
 
Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++Implementation of Computer Vision Applications using OpenCV in C++
Implementation of Computer Vision Applications using OpenCV in C++
 
Artificial Neural Network Based Offline Signature Recognition System Using Lo...
Artificial Neural Network Based Offline Signature Recognition System Using Lo...Artificial Neural Network Based Offline Signature Recognition System Using Lo...
Artificial Neural Network Based Offline Signature Recognition System Using Lo...
 
DevanagiriOCR on CELL BROADBAND ENGINE
DevanagiriOCR on CELL BROADBAND ENGINEDevanagiriOCR on CELL BROADBAND ENGINE
DevanagiriOCR on CELL BROADBAND ENGINE
 
journal nakk
journal nakkjournal nakk
journal nakk
 

More from Shobhit Saxena

***भारत की जय जय कार ***
***भारत की जय जय कार *** ***भारत की जय जय कार ***
***भारत की जय जय कार *** Shobhit Saxena
 
DFD For E-learning Project
DFD For E-learning ProjectDFD For E-learning Project
DFD For E-learning ProjectShobhit Saxena
 
Write a program to perform Perspective projection
Write a program to perform Perspective projectionWrite a program to perform Perspective projection
Write a program to perform Perspective projectionShobhit Saxena
 
Write a program to perform Oblique projection
Write a program to perform Oblique projectionWrite a program to perform Oblique projection
Write a program to perform Oblique projectionShobhit Saxena
 
Write a program to perform Orthographic projection.
Write a program to perform Orthographic  projection.Write a program to perform Orthographic  projection.
Write a program to perform Orthographic projection.Shobhit Saxena
 
Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve. Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve. Shobhit Saxena
 
Write a program to perform translation.
 Write a program to perform translation. Write a program to perform translation.
Write a program to perform translation.Shobhit Saxena
 
Graphics and Graphics Hardware System
Graphics and Graphics Hardware SystemGraphics and Graphics Hardware System
Graphics and Graphics Hardware SystemShobhit Saxena
 
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATIONWEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATIONShobhit Saxena
 
Progress report for research paper
Progress report for research paperProgress report for research paper
Progress report for research paperShobhit Saxena
 
Write a program to perform translation
Write a program to perform translationWrite a program to perform translation
Write a program to perform translationShobhit Saxena
 
Community outreach portfolio shobhit
Community outreach portfolio shobhitCommunity outreach portfolio shobhit
Community outreach portfolio shobhitShobhit Saxena
 
Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Shobhit Saxena
 

More from Shobhit Saxena (18)

***भारत की जय जय कार ***
***भारत की जय जय कार *** ***भारत की जय जय कार ***
***भारत की जय जय कार ***
 
DFD For E-learning Project
DFD For E-learning ProjectDFD For E-learning Project
DFD For E-learning Project
 
Write a program to perform Perspective projection
Write a program to perform Perspective projectionWrite a program to perform Perspective projection
Write a program to perform Perspective projection
 
Write a program to perform Oblique projection
Write a program to perform Oblique projectionWrite a program to perform Oblique projection
Write a program to perform Oblique projection
 
Write a program to perform Orthographic projection.
Write a program to perform Orthographic  projection.Write a program to perform Orthographic  projection.
Write a program to perform Orthographic projection.
 
Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve. Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve.
 
Write a program to perform translation.
 Write a program to perform translation. Write a program to perform translation.
Write a program to perform translation.
 
Graphics and Graphics Hardware System
Graphics and Graphics Hardware SystemGraphics and Graphics Hardware System
Graphics and Graphics Hardware System
 
Cover letter
Cover letterCover letter
Cover letter
 
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATIONWEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
 
Progress report for research paper
Progress report for research paperProgress report for research paper
Progress report for research paper
 
Ionic app
Ionic appIonic app
Ionic app
 
Window to viewprt
Window to viewprtWindow to viewprt
Window to viewprt
 
Write a program to perform translation
Write a program to perform translationWrite a program to perform translation
Write a program to perform translation
 
Surface rendering
Surface renderingSurface rendering
Surface rendering
 
Community outreach portfolio shobhit
Community outreach portfolio shobhitCommunity outreach portfolio shobhit
Community outreach portfolio shobhit
 
Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561Ch22 parallel d_bs_cs561
Ch22 parallel d_bs_cs561
 
Shobhit portfolio
Shobhit portfolioShobhit portfolio
Shobhit portfolio
 

Recently uploaded

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 

Recently uploaded (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 

Opticalcharacter recognition

  • 1. Opticalcharacter recognition OCR Shobhit Saxena Amity University Saxenashobhit1988@gmail.com Nidhi Sharma Amity University nidhi9392@gmail.com Abstract—Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. It is widely used as a form of data entry from some sort of original paper data source, whether documents, sales receipts, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to- speech and text mining.OCR is a field of research in pattern recognition, artificial intelligence and computer vision. More recently, the term Intelligent Character Recognition(ICR) has been used to describe the process of interpreting image data, in particular alphanumeric text I. INTRODUCTION The area of Optical Character Recognition (OCR) involves locating the characters in the image and converting them into text files. The character in the image cannot be processed as such and need to be represented in suitable character coding. With the availability of computers at cheap rates and convenience in dealing with data in digital form everyone aims at storing data in digital form. Data in the form of hard copies of the text documents are stored in digital form by scanning them as image files. These images do not support the operations based on text such as editing, summarizing etc, and it is quite tedious task to manually feed this data into computer systems. This is where OCR comes into play. II. PROBLEMS AND MOTIVATION A. Currently, OCR is available a beta product (a product in experimental stage) and research is still being carried out in this field. The OCR employs a part of Artificial Intelligence, which again is an under research topic. The main problem of OCR system is to correctly interpret the images of characters. This procedure makes use of Pattern classification algorithms. There are several algorithms available and various others are being formulated which can be chosen to be used in the OCR implementation. III. APPROACH AND GOAL OF THIS PROJECT The goal of this project is to implement OCR using LVQ (Learning Vector Quantization) algorithm. This project uses supervised learning approach of Pattern Classification. First the image is studied so as to detect the possibility of presence of a character, and if such image is found, it is associated with suitable character code. The main steps involved in OCR are: A. Pre-processing The digitized images are usually in gray tone, and for a clear document, a simple histogram based threshold approach is sufficient for converting them to two three tone images. The histogram of gray values of the pixels shows two prominent peaks, and a middle gray value located between the peaks is a good choice for threshold. For salt and pepper noise we generally use median filter. Median filter replaces the value of a pixel by the median of gray levels in the neighborhood of that pixel (the original value of the pixel is included in the computation of the median), Median filters provide excellent noise reduction capabilities, with considering less blurring than linear smoothing filters of similar size. (The image illustrated below is only an example) Fig 1.1 Image with salt and pepper noise
  • 2. Fig1.2 Image without salt and pepper noise B. Segmentation Segmentation is one of the most important phases of OCR system. By applying good segmentation techniques we can increase the performance of OCR. Segmentation subdivides an image into its constituent regions or objects. Basically in segmentation, we try to extract basic constituent of the script, which are certainly characters. This is needed because our classifier recognizes these characters only. Segmentation phase is also crucial in contributing to this error due to touching characters, which the classifier cannot properly tackle. Even in good quality documents, some adjacent characters touch each other due to inappropriate scanning resolution.  Segmentation of Line: Text lines are detected by horizontal scanning. For segmentation of line, we scan scanned document page horizontally from the top and find the last row containing all white pixels, before a black pixel is found. Then we find the first row containing entire white pixel just after the end of black pixels. We repeated this process on entire page to find out all lines.  Segmentation of Words: After finding a particular line we separate individual words. This is done by vertical scanning.  Segmentation of Individual Characters Once we get the words we segment it to individual characters. Before segmenting words to individual characters, we locate the head line. This is done by finding the rows having maximum number of black pixels in a word. After locating head line we remove it i.e. Converts it in white pixels. After removing head line our word is divided into three horizontal parts known as upper zone, middle zone and lower zone. Individual characters are separated from each zone by applying vertical scanning. Fig 1.3 Output of the segmentation Classification  Classification is performed based on the extracted features. For initial classification of characters, we consider three features as follows:  • Mean Distance  • Histogram of projection based on pixel value  • Histogram of projection based on spatial position of pixel Feature Extraction: Feature extraction is one of the most important steps in developing a classification system. This step describes the various features selected by us for classification of the selected characters. Fig 1.4 Control flow of OCR
  • 3. C. Where are we today? The advent of the array method of scanning, coupled with the higher speeds and more compact computing power, has led to the concept of "Image Processing". Image processing does not have to utilize optical recognition to be successful. For example, the ability to change any document to an electronically digitized item may effectively replace microfilm devices. This provides the user a much more convenient method of sorting images compared to handling actual documents or microfilm pictures. Image processing relies on larger more complex arrays than early third generation OCR scanners. A graph within a graph is an “inset,” not an “insert.” The word alternatively is preferred to the word “alternately” (unless you really mean something that alternates). COMPARISON TABLE OF OCR AND OMR ITEM OCR OMR Handprint recognition Y N Machine print recognition Y N Recognition of checks and "X"s Y Y Requires timing tracks/ form IDs N Y Requires registration marks Y N Electronic image storage and retrieval Y N D. Design objectives: Design objective includes the key points to be considered in designing the software. Some of the important design issues to be dealt with are: System must be user friendly: A system is no good if it does not eases the work of its operator and can be used easily. User friendly systems are easy to use and adapt. System must make the task comprehensible: A system must have clear objectives and should be capable of doing exactly what is told to it. System must be transparent: The working of the system must be clear to the user so that it can easily be modified and it is possible to troubleshoot it. 2. Acquiring input data: Acquiring input data indicates the various issues in feeding the data for processing. Input data can be acquired by two methods: Scanning the image: This method involves the use of scanners. Obtaining the pre-scanned image. In this method an image is fed to the OCR system. 3. Components of OCR: 3.1 Character Tracer: This component will locate characters in the image. 3.2 Mean Square Error Recognizer: Recognizes characters of target image, with the help of training image. 3.3 Handwriting Recognizer: This feature learns to recognize the handwriting of an individual. It has following subcomponents: a) Training image: This image is used as reference image. b) Configure: This component chooses the recognizer process from MSE and Character aspect ratio analyzer. c) Process: This feature processes the target image. 3.4 Validation rules: Validation rules are used to ensure proper functioning of any system. Validation rules can be applied on both input as well as output. 3.4Validation rules for image fed to OCR System Format: Input image should be in proper format, generally the images stored using raster graphics is easy to process and interpret. Memory size: The image should be with in proper memory size limits. Resolution: Resolution of image determines the quality of the image and its dimensions. Image should be of appropriate resolution (400x400). Availability: Availability means that the image must be present at the location specified and should exist at the time of processing by the system.
  • 4. 3.5 Validation rules for the output: Once the input is fed properly and is processed, the output should be validated before presenting it to the user. Some of the common validation rules for the output are: Character encoding: The characters of the output file should be encoded using proper method. The most commonly used encoding technique is Unicode character encoding because it can represent fonts of nearly every language. Number of characters generated: An optimum number of characters must be generated to make the output meaningful. The output is of no use if the OCR system is not able to recognize majority of characters. Format of output file: The output file should be in a format that suits user. The output file will be checked for errors and consistency using checksums that are calculated at various stages of processing. Error messages: Error messages are produced to assist users in operating the system. Errors are the violation of validation rules or the unexpected behavior of the system. Error messages should be simple and descriptive and must give an overview of the problem occurred. Typical error messages in case of OCR can be: Image not found: This error message is to be displayed when the system is unable to locate the image to be processed. Invalid format: This message can be displayed when the files are of unknown format or the file header is broken. Out of memory: This situation arises when the physical memory is scarce. 3.6 Interfaces: Interfaces define a way to interact with the system. Characteristics of a good interface are: 3.7 There are two popular interfaces: a) Single complete view: In this type of interface the control switches are displayed in a window all at once. b) Tabbed view: In tabbed view, only the control switches related to the selected option are displayed simultaneously. Tabbed view is simpler and has greater clarity. For our project we will prefer tabbed view. IMPLEMENTATION 4.1Character Tracer 4.2Input-file selection 4.3Input-image for character tracer
  • 5. 4.4Character Tracer Output 4.5Handwriting Recognizer Training 4.6Handwriting recognizer input image: 4.7Handwriting Recognizer Configuration ] 5.1 Handwriting recognizer target image 5.2Handwriting Recognizer processing and output 6. Maintenance The job of the developer continues even after delivering the product, in the form of maintenance. Maintenance is necessary to ensure the proper functioning and allow the system to adapt to ever increasing needs. 6.1 Types of Maintenance: a) Fixing This type of maintenance involves the removal of errors. It can further be divided into following types: Corrective: The corrective maintenance involves the identification and removal of defects. The aim is to remove the errors. Adaptive: The adaptive maintenance involves the process of modifying the software so as to adapt the changes in the runtime execution such as change in OS, hardware and database. Since this project is build in java it will not need any adaptive maintenance due to change in OS, only the corresponding jvm needs to be installed in new host OS. b) Enhancing: This type of maintenance involves increasing the software functionalities as demanded. It has following sub-types: Perfective: Changes made due to user request, ie when user demands any specific changes. This may include change in layout, GUI etc.
  • 6. Preventative: This involves making the system more maintainable. In OCR, enhancing may involve increasing the type of fonts that can be recognized and ways in which the recognized fonts may be represented. 7.3 Documentation and user’s training. To make the most of the system, its users have to be made aware of the ways so as to exploit the system’s functionality as much as possible. User needs to be trained in following areas: a) Hardware requirements: The end user is the one who has to interact with the system on day-to-day basis. So he must be trained about the hardware issues, so that he can troubleshoot the minor problems himself and minimize the risk of damage to the system. b) Average processing time: User must be aware of the average time required by the system to complete its processing, so that he waits for appropriate time before instructing the system to do another job. c) Proper input methods: Proper input methods are a must for a system to work efficiently hence it is necessary for the user to provide input in desired manner. Result and conclusion OCR is the acronym for Optical Character Recognition. This technology allows a machine to automatically recognize characters through an optical mechanism. Human beings recognize many objects in this manner our eyes are the "optical mechanism." But while the brain "sees" the input, the ability to comprehend these signals varies in each person according to many factors. By reviewing these variables, we can understand the challenges faced by the technologist developing an OCR system. First, if we read a page in a language other than our own, we may recognize the various characters, but be unable to recognize words. However, on the same page, we are usually able to interpret numerical statements - the symbols for numbers are universally used. This explains why many OCR systems recognize numbers only, while relatively few understand the full alphanumeric character range. Second, there is similarity between many numerical and alphabetical symbol shapes. For example, while examining a string of characters combining letters and numbers, there is very little visible difference between a capital letter "O" and the numeral "0." As humans, we can re-read the sentence or entire paragraph to help us determine the accurate meaning. This procedure, however, is much more difficult for a machine. Third, we rely on contrast to help us recognize characters. We may find it very difficult to read text which appears against a very dark background, or is printed over other words or graphics. Again, programming a system to interpret only the relevant data and disregard the rest is a difficult task for OCR engineers. There are many other problems which challenge the developers of OCR systems. In this paper, we will review the history, advancements, abilities and limitations of existing systems. This analysis should help determine if OCR is the correct application for your company's needs, and if so, which type of system to implement. . References [1] http://en.wikipedia.org/wiki/Image_scanner [2] http://en.wikipedia.org/wiki/OCR [3] S. Mori et.al, “Historical Reviewof OCR ResearchandDevelopment”, Proceeding IEEE, 80, no 7, pp. 1029-1058, July 1992. [4] A. Chaudhary, E.A.S. Ahmad, S. Hossain, C. M. Rahman, “OCR of Bangla Character Using Neural Network: A better Approach”, 2nd International Conferenceon Electrical Engineering(ICEE 2002),khuln, Bangladesh. [5] Utpal Garain and Bidyut B. Chaudhary, “Segmentation of Touching Characterin Printed Devnagari and Bangla Script Using Fuzzy Multi factorial Analysis”, IEEE [6] TransactiononSystem, Manand Cybernetics- Part C: Applications and Reviews, 32, November 2002. Page(s): 449-459. [7] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.