SlideShare une entreprise Scribd logo
1  sur  11
Tesseract OCR Engine
Contents 
• Introduction & history of OCR 
• Tesseract architecture & methods 
• Announcing Tesseract 2.00 
• Training Tesseract 
• Future enhancements
A Brief History of OCR 
• What is Optical Character Recognition? 
• A Brief History of OCR 
• OCR predates electronic computers!
A Brief History of OCR 
• 1929 – Digit recognition machine 
• 1953 – Alphanumeric recognition machine 
• 1965 – US Mail sorting 
• 1965 – British banking system 
• 1976 – Kurzweil reading machine 
• 1985 – Hardware-assisted PC software 
• 1988 – Software-only PC software 
• 1994-2000 – Industry consolidation
Tesseract Background 
• Developed on HP-UX at HP between 1985 
• and 1994 to run in a desktop scanner. 
• Came neck and neck with Caere and XIS 
• in the 1995 UNLV test. 
• (See http://www.isri.unlv.edu/downloads/AT-1995.pdf ) 
• Never used in an HP product. 
• Open sourced in 2005. Now on: 
• http://code.google.com/p/tesseract-ocr 
• Highly portable.
Tesseract OCR Architecture 
• Baselines are rarely perfectly straight 
• Text Line Finding – skew independent – 
• published at ICDAR’95 Montreal. 
• (http://scholar.google.com/scholar?q=skew+detection+smith) 
• Baselines are approximated by quadratic splines 
• to account for skew and curl. 
• Meanline, ascender and descender lines are a 
• constant displacement from baseline. 
• Critical value is the x-height.
Spaces between words are tricky too 
• Italics, digits, punctuation all create special-case font-dependent 
spacing. 
• • Fully justified text in narrow columns can have vastly varying 
spacing on different lines. 
• Tesseract: Recognize Word 
• Outline Approximation 
• Polygonal approximation is a double-edged sword. 
• Noise and some pertinent information are both lost.
Tesseract: Features and Matching 
• Static classifier uses outline fragments as features. Broken characters 
are easily recognizable by a small->large matching process in classifier. 
(This is slow.) 
• Adaptive classifier uses the same technique! 
• (Apart from normalization method)
Announcing tesseract-2.00 
• Fully Unicode (UTF-8) capable 
• Already trained for 6 Latin-based 
• languages (Eng, Fra, Ita, Deu, Spa, Nld) 
• Code and documented process to train at 
• http://code.google.com/p/tesseract-ocr 
• UNLV regression test framework 
• Other minor fixes
Commercial OCR v Tesseract 
• Page layout analysis. 
• More languages. 
• Improve accuracy. 
• Add a UI.

Contenu connexe

Tendances

Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Vidyut Singhania
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Text to Speech for Mobile Voice
Text to Speech for Mobile Voice Text to Speech for Mobile Voice
Text to Speech for Mobile Voice June Hostetter
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) pptDeijee Kalita
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power pointMadhuri Yellapu
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting RecognitionBindu Karki
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine LearningRitwikSaurabh1
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionDurjoy Saha
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationTriloki Gupta
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Artificial intelligence in speech recognition
Artificial intelligence in speech recognitionArtificial intelligence in speech recognition
Artificial intelligence in speech recognitionRajanivetha G
 

Tendances (20)

Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Text to Speech for Mobile Voice
Text to Speech for Mobile Voice Text to Speech for Mobile Voice
Text to Speech for Mobile Voice
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Ocr abstract
Ocr abstractOcr abstract
Ocr abstract
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols Classification
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Artificial intelligence in speech recognition
Artificial intelligence in speech recognitionArtificial intelligence in speech recognition
Artificial intelligence in speech recognition
 

En vedette

Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Svetlin Nakov
 
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...Christos Demetriou
 
基于Python构建可扩展的自动化运维平台
基于Python构建可扩展的自动化运维平台基于Python构建可扩展的自动化运维平台
基于Python构建可扩展的自动化运维平台liuts
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners Sujith Kumar
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python amiable_indian
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey
 
Déposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALDéposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALOAccsd
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonNowell Strite
 

En vedette (18)

OCR using Tesseract
OCR using TesseractOCR using Tesseract
OCR using Tesseract
 
Tamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR EngineTamil OCR using Tesseract OCR Engine
Tamil OCR using Tesseract OCR Engine
 
Tasract OCR
Tasract OCRTasract OCR
Tasract OCR
 
Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009Tesseract OCR Engine - OpenFest 2009
Tesseract OCR Engine - OpenFest 2009
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
ocr
ocrocr
ocr
 
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...
As Ict (Ocr) G061 3.1.6 Application Software used for the Presentation & Comm...
 
基于Python构建可扩展的自动化运维平台
基于Python构建可扩展的自动化运维平台基于Python构建可扩展的自动化运维平台
基于Python构建可扩展的自动化运维平台
 
Introduction to python for Beginners
Introduction to python for Beginners Introduction to python for Beginners
Introduction to python for Beginners
 
Introduction to Python
Introduction to Python Introduction to Python
Introduction to Python
 
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and TesseractScalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
 
Basics of-optical-character-recognition
Basics of-optical-character-recognitionBasics of-optical-character-recognition
Basics of-optical-character-recognition
 
Raspberry pi
Raspberry pi Raspberry pi
Raspberry pi
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
Déposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HALDéposer une thèse dans TEL ou HAL
Déposer une thèse dans TEL ou HAL
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similaire à Tesseract OCR Engine

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Utilizing the Pre-trained Model Effectively for Speech Translation
Utilizing the Pre-trained Model Effectively for Speech TranslationUtilizing the Pre-trained Model Effectively for Speech Translation
Utilizing the Pre-trained Model Effectively for Speech TranslationChen Xu
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL DatabasesEmanuel Calvo
 
ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015Guilherme Vaz
 
From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-timeDani Solà Lagares
 
CBDW2014 - Down the RabbitMQ hole with ColdFusion
CBDW2014 - Down the RabbitMQ hole with ColdFusionCBDW2014 - Down the RabbitMQ hole with ColdFusion
CBDW2014 - Down the RabbitMQ hole with ColdFusionOrtus Solutions, Corp
 
From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-timeMarc Sturlese
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16allingeek
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima as af
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Ricard Clau
 
Whole Platform LWC11 Submission
Whole Platform LWC11 SubmissionWhole Platform LWC11 Submission
Whole Platform LWC11 SubmissionRiccardo Solmi
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012Eonblast
 
Hunting for anglerfish in datalakes
Hunting for anglerfish in datalakesHunting for anglerfish in datalakes
Hunting for anglerfish in datalakesDominic Egger
 

Similaire à Tesseract OCR Engine (20)

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Utilizing the Pre-trained Model Effectively for Speech Translation
Utilizing the Pre-trained Model Effectively for Speech TranslationUtilizing the Pre-trained Model Effectively for Speech Translation
Utilizing the Pre-trained Model Effectively for Speech Translation
 
Enterprise messaging
Enterprise messagingEnterprise messaging
Enterprise messaging
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
 
ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015
 
From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-time
 
CBDW2014 - Down the RabbitMQ hole with ColdFusion
CBDW2014 - Down the RabbitMQ hole with ColdFusionCBDW2014 - Down the RabbitMQ hole with ColdFusion
CBDW2014 - Down the RabbitMQ hole with ColdFusion
 
From legacy, to batch, to near real-time
From legacy, to batch, to near real-timeFrom legacy, to batch, to near real-time
From legacy, to batch, to near real-time
 
Verification with LoLA: 1 Basics
Verification with LoLA: 1 BasicsVerification with LoLA: 1 Basics
Verification with LoLA: 1 Basics
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16Getting Deep on Orchestration - Nickoloff - DockerCon16
Getting Deep on Orchestration - Nickoloff - DockerCon16
 
OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima OCR by Abdullah Ahmed Abu Rtima
OCR by Abdullah Ahmed Abu Rtima
 
Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015Modern software architectures - PHP UK Conference 2015
Modern software architectures - PHP UK Conference 2015
 
2018 12-kube con-ballerinacon
2018 12-kube con-ballerinacon2018 12-kube con-ballerinacon
2018 12-kube con-ballerinacon
 
Whole Platform LWC11 Submission
Whole Platform LWC11 SubmissionWhole Platform LWC11 Submission
Whole Platform LWC11 Submission
 
VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012VoltDB and Erlang - Tech planet 2012
VoltDB and Erlang - Tech planet 2012
 
Hunting for anglerfish in datalakes
Hunting for anglerfish in datalakesHunting for anglerfish in datalakes
Hunting for anglerfish in datalakes
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 

Plus de Raghu nath

Ftp (file transfer protocol)
Ftp (file transfer protocol)Ftp (file transfer protocol)
Ftp (file transfer protocol)Raghu nath
 
Javascript part1
Javascript part1Javascript part1
Javascript part1Raghu nath
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsRaghu nath
 
Selection sort
Selection sortSelection sort
Selection sortRaghu nath
 
Binary search
Binary search Binary search
Binary search Raghu nath
 
JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)Raghu nath
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithmsRaghu nath
 
Step by step guide to install dhcp role
Step by step guide to install dhcp roleStep by step guide to install dhcp role
Step by step guide to install dhcp roleRaghu nath
 
Network essentials chapter 4
Network essentials  chapter 4Network essentials  chapter 4
Network essentials chapter 4Raghu nath
 
Network essentials chapter 3
Network essentials  chapter 3Network essentials  chapter 3
Network essentials chapter 3Raghu nath
 
Network essentials chapter 2
Network essentials  chapter 2Network essentials  chapter 2
Network essentials chapter 2Raghu nath
 
Network essentials - chapter 1
Network essentials - chapter 1Network essentials - chapter 1
Network essentials - chapter 1Raghu nath
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2Raghu nath
 
python chapter 1
python chapter 1python chapter 1
python chapter 1Raghu nath
 
Linux Shell Scripting
Linux Shell ScriptingLinux Shell Scripting
Linux Shell ScriptingRaghu nath
 

Plus de Raghu nath (20)

Mongo db
Mongo dbMongo db
Mongo db
 
Ftp (file transfer protocol)
Ftp (file transfer protocol)Ftp (file transfer protocol)
Ftp (file transfer protocol)
 
MS WORD 2013
MS WORD 2013MS WORD 2013
MS WORD 2013
 
Msword
MswordMsword
Msword
 
Ms word
Ms wordMs word
Ms word
 
Javascript part1
Javascript part1Javascript part1
Javascript part1
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Selection sort
Selection sortSelection sort
Selection sort
 
Binary search
Binary search Binary search
Binary search
 
JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)JSON(JavaScript Object Notation)
JSON(JavaScript Object Notation)
 
Stemming algorithms
Stemming algorithmsStemming algorithms
Stemming algorithms
 
Step by step guide to install dhcp role
Step by step guide to install dhcp roleStep by step guide to install dhcp role
Step by step guide to install dhcp role
 
Network essentials chapter 4
Network essentials  chapter 4Network essentials  chapter 4
Network essentials chapter 4
 
Network essentials chapter 3
Network essentials  chapter 3Network essentials  chapter 3
Network essentials chapter 3
 
Network essentials chapter 2
Network essentials  chapter 2Network essentials  chapter 2
Network essentials chapter 2
 
Network essentials - chapter 1
Network essentials - chapter 1Network essentials - chapter 1
Network essentials - chapter 1
 
Python chapter 2
Python chapter 2Python chapter 2
Python chapter 2
 
python chapter 1
python chapter 1python chapter 1
python chapter 1
 
Linux Shell Scripting
Linux Shell ScriptingLinux Shell Scripting
Linux Shell Scripting
 
Perl
PerlPerl
Perl
 

Dernier

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Dernier (20)

BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Tesseract OCR Engine

  • 2. Contents • Introduction & history of OCR • Tesseract architecture & methods • Announcing Tesseract 2.00 • Training Tesseract • Future enhancements
  • 3. A Brief History of OCR • What is Optical Character Recognition? • A Brief History of OCR • OCR predates electronic computers!
  • 4.
  • 5. A Brief History of OCR • 1929 – Digit recognition machine • 1953 – Alphanumeric recognition machine • 1965 – US Mail sorting • 1965 – British banking system • 1976 – Kurzweil reading machine • 1985 – Hardware-assisted PC software • 1988 – Software-only PC software • 1994-2000 – Industry consolidation
  • 6. Tesseract Background • Developed on HP-UX at HP between 1985 • and 1994 to run in a desktop scanner. • Came neck and neck with Caere and XIS • in the 1995 UNLV test. • (See http://www.isri.unlv.edu/downloads/AT-1995.pdf ) • Never used in an HP product. • Open sourced in 2005. Now on: • http://code.google.com/p/tesseract-ocr • Highly portable.
  • 7. Tesseract OCR Architecture • Baselines are rarely perfectly straight • Text Line Finding – skew independent – • published at ICDAR’95 Montreal. • (http://scholar.google.com/scholar?q=skew+detection+smith) • Baselines are approximated by quadratic splines • to account for skew and curl. • Meanline, ascender and descender lines are a • constant displacement from baseline. • Critical value is the x-height.
  • 8. Spaces between words are tricky too • Italics, digits, punctuation all create special-case font-dependent spacing. • • Fully justified text in narrow columns can have vastly varying spacing on different lines. • Tesseract: Recognize Word • Outline Approximation • Polygonal approximation is a double-edged sword. • Noise and some pertinent information are both lost.
  • 9. Tesseract: Features and Matching • Static classifier uses outline fragments as features. Broken characters are easily recognizable by a small->large matching process in classifier. (This is slow.) • Adaptive classifier uses the same technique! • (Apart from normalization method)
  • 10. Announcing tesseract-2.00 • Fully Unicode (UTF-8) capable • Already trained for 6 Latin-based • languages (Eng, Fra, Ita, Deu, Spa, Nld) • Code and documented process to train at • http://code.google.com/p/tesseract-ocr • UNLV regression test framework • Other minor fixes
  • 11. Commercial OCR v Tesseract • Page layout analysis. • More languages. • Improve accuracy. • Add a UI.