SlideShare une entreprise Scribd logo
1  sur  19
K. Zagoris, K. Ergina and N. Papamarkos
Image Processing and Multimedia Laboratory
Department of Electrical & Computer Engineering
Democritus University of Thrace,
67100 Xanthi, Greece
 Phenomenal growth of the size of multimedia data
and especially document images
 Caused by the easiness to create such images using
scanners or digital cameras
 Huge quantities of document images are created and
stored in image archives without having any indexing
information
Theoverall structureof the DocumentImage Retrieval System
 Binarization
(Otsu Technique)
 Original Document
 Median Filter
 Indentify all the Connected Components (CCs)
 Calculate the most common height of the
document CCs (CCch)
 Reject the CCs with height less than 70% of the
CCch. That only reject areas of punctuation
points and noise.
 Expand the left and right sides of the resulted
CCs by 20% of the CCch
 The words are the merged overlapping CCs
Using the Connected
Components Labeling and
Filtering method
Word
Segmentation
 Width to Height Ratio
 Word Area Density. The percentage of the black
pixels included in the word-bounding box
 Center of Gravity. The Euclidean distance from the
word’s center of gravity to the upper left corner of the
bounding box:
(1,0) (0,1)
(0,0) (0,0)
,x y
M M
C C
M M
 
( , )
qp
pq
x y
x y
M f x y
width height
  
   
   

 Vertical Projection. The first twenty (20) coefficients
of the Discrete Cosine Transform (DCT) of the
smoothed and normalized vertical projection.
 Original Image
 The Vertical
Projection
 Smoothed and
normalized
 Top – Bottom Shape Projections. A vector of 50 elements
 The first 25 values are the first 25 coefficients of the smoothed
and normalized Top Shape Projection DCT
 The rest 25 values are equal to the first 25 coefficients of the
smoothed and normalized Bottom Shape Projection DCT.
 Upper Grid Features is a ten element vector with
binary values which are extracted from the upper part
of each word image.
 Down Grid Features is a ten element vector with
binary values which are extracted from the lower part
of the word image.
[0,0,0,1195 ,0,0,0,0,0,0]
[0,0,0,1 ,0,0,0,0,0,0]
[0,0,0,0 ,0,0,0, 598 , 50 , 33 ]
[0,0,0,0 ,0,0,0,1,1,0]
Descriptor
The Structure of the
Descriptor
 User enters a query word
 The proposed system creates an image of the query
word with font height equal to the average height of all
the word-boxes obtained through the Word
Segmentation stage of the Offline operation.
 For our experimental set the average height is 50
 The font type of the query image is Arial
 The smoothing and normalizing of the various
features described before, suppress small differences
between various types of fonts
The Matching Process
 100 image documents created artificially from various
texts
 Then Gaussian and “Salt and Pepper” noise was added
 Implement in parallel a text search engine which
makes easier the verification and evaluation of the
search results of the proposed system
Implementation
o Visual Studio 2008
o Microsoft .NET
Framework 2.0
o C# Language
o Microsoft SQL
Server 2005
http://orpheus.ee.duth.gr/irs2_5/
Evaluation
o Precision and the
Recall metrics
o 30 searches in 100
document images
o Font Query: Arial
 Mean Precision: 87.8%
 Mean Recall: 99.26%
FineReader® 9.0 OCR Program Query Font Name “Tahoma”.
 Mean Precision: 76.67%
 Mean Recall: 58.42%
 Mean Precision: 89.44%
 Mean Recall: 88.05%
 The query word is given in text and then transformed
to word image
 The proposed system extract nine (9) powerful
features for the description of the word images
 These features describe satisfactorily the shape of the
words while at the same moment they suppress small
differences due to noise, size and type of fonts
 Based on our experiments the proposed system
performs better in the same database than a
commercial OCR package
Developing Document Image Retrieval System

Contenu connexe

Tendances

Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
Vinh Bui
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
Jinwon Lee
 

Tendances (20)

Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesDynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
 
Self-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with SmoothingSelf-Directing Text Detection and Removal from Images with Smoothing
Self-Directing Text Detection and Removal from Images with Smoothing
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Btv thesis defense_v1.02-final
Btv thesis defense_v1.02-finalBtv thesis defense_v1.02-final
Btv thesis defense_v1.02-final
 
Sub1586
Sub1586Sub1586
Sub1586
 
Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Ijetcas14 527
Ijetcas14 527Ijetcas14 527
Ijetcas14 527
 
O017429398
O017429398O017429398
O017429398
 
Pillar k means
Pillar k meansPillar k means
Pillar k means
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence MatrixSteganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
Steganalysis of LSB Embedded Images Using Gray Level Co-Occurrence Matrix
 
Enhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wildEnhanced characterness for text detection in the wild
Enhanced characterness for text detection in the wild
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
 
J017426467
J017426467J017426467
J017426467
 
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397Radial Thickness Calculation and Visualization for Volumetric Layers-8397
Radial Thickness Calculation and Visualization for Volumetric Layers-8397
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search201907 AutoML and Neural Architecture Search
201907 AutoML and Neural Architecture Search
 

Similaire à Developing Document Image Retrieval System

The Computer Aided Design Concept in the Concurrent Engineering Context.
The Computer Aided Design Concept in the Concurrent Engineering Context.The Computer Aided Design Concept in the Concurrent Engineering Context.
The Computer Aided Design Concept in the Concurrent Engineering Context.
Nareshkumar Kannathasan
 
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
Sudhendu Rai
 
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Waqas Tariq
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
Madhu Rock
 
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
CSCJournals
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
csandit
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 

Similaire à Developing Document Image Retrieval System (20)

Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
poster
posterposter
poster
 
The Computer Aided Design Concept in the Concurrent Engineering Context.
The Computer Aided Design Concept in the Concurrent Engineering Context.The Computer Aided Design Concept in the Concurrent Engineering Context.
The Computer Aided Design Concept in the Concurrent Engineering Context.
 
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
MULTIOBJECTIVE OPTIMIZATION AND QUANTITATIVE TRADE-OFF ANALYSIS IN XEROGRAPHI...
 
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
Extended Performance Appraise of Image Retrieval Using the Feature Vector as ...
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
Text Detection and Recognition in Natural Images
Text Detection and Recognition in Natural ImagesText Detection and Recognition in Natural Images
Text Detection and Recognition in Natural Images
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVALA COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
A COMPARATIVE ANALYSIS OF RETRIEVAL TECHNIQUES IN CONTENT BASED IMAGE RETRIEVAL
 
A comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrievalA comparative analysis of retrieval techniques in content based image retrieval
A comparative analysis of retrieval techniques in content based image retrieval
 
Presentation vision transformersppt.pptx
Presentation vision transformersppt.pptxPresentation vision transformersppt.pptx
Presentation vision transformersppt.pptx
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
 
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffAnalyzing Changes in Software Systems From ChangeDistiller to FMDiff
Analyzing Changes in Software Systems From ChangeDistiller to FMDiff
 
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
Comprehensive Performance Comparison of Cosine, Walsh, Haar, Kekre, Sine, Sla...
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
Speech Emotion Recognition Using Machine Learning
Speech Emotion Recognition Using Machine LearningSpeech Emotion Recognition Using Machine Learning
Speech Emotion Recognition Using Machine Learning
 
MOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptxMOVIE RECOMMENDATION SYSTEM.pptx
MOVIE RECOMMENDATION SYSTEM.pptx
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Developing Document Image Retrieval System

  • 1. K. Zagoris, K. Ergina and N. Papamarkos Image Processing and Multimedia Laboratory Department of Electrical & Computer Engineering Democritus University of Thrace, 67100 Xanthi, Greece
  • 2.  Phenomenal growth of the size of multimedia data and especially document images  Caused by the easiness to create such images using scanners or digital cameras  Huge quantities of document images are created and stored in image archives without having any indexing information
  • 3. Theoverall structureof the DocumentImage Retrieval System
  • 4.  Binarization (Otsu Technique)  Original Document  Median Filter
  • 5.  Indentify all the Connected Components (CCs)  Calculate the most common height of the document CCs (CCch)  Reject the CCs with height less than 70% of the CCch. That only reject areas of punctuation points and noise.  Expand the left and right sides of the resulted CCs by 20% of the CCch  The words are the merged overlapping CCs Using the Connected Components Labeling and Filtering method Word Segmentation
  • 6.  Width to Height Ratio  Word Area Density. The percentage of the black pixels included in the word-bounding box  Center of Gravity. The Euclidean distance from the word’s center of gravity to the upper left corner of the bounding box: (1,0) (0,1) (0,0) (0,0) ,x y M M C C M M   ( , ) qp pq x y x y M f x y width height            
  • 7.  Vertical Projection. The first twenty (20) coefficients of the Discrete Cosine Transform (DCT) of the smoothed and normalized vertical projection.  Original Image  The Vertical Projection  Smoothed and normalized
  • 8.  Top – Bottom Shape Projections. A vector of 50 elements  The first 25 values are the first 25 coefficients of the smoothed and normalized Top Shape Projection DCT  The rest 25 values are equal to the first 25 coefficients of the smoothed and normalized Bottom Shape Projection DCT.
  • 9.  Upper Grid Features is a ten element vector with binary values which are extracted from the upper part of each word image.  Down Grid Features is a ten element vector with binary values which are extracted from the lower part of the word image.
  • 10. [0,0,0,1195 ,0,0,0,0,0,0] [0,0,0,1 ,0,0,0,0,0,0] [0,0,0,0 ,0,0,0, 598 , 50 , 33 ] [0,0,0,0 ,0,0,0,1,1,0]
  • 11. Descriptor The Structure of the Descriptor
  • 12.  User enters a query word  The proposed system creates an image of the query word with font height equal to the average height of all the word-boxes obtained through the Word Segmentation stage of the Offline operation.  For our experimental set the average height is 50  The font type of the query image is Arial  The smoothing and normalizing of the various features described before, suppress small differences between various types of fonts
  • 14.  100 image documents created artificially from various texts  Then Gaussian and “Salt and Pepper” noise was added  Implement in parallel a text search engine which makes easier the verification and evaluation of the search results of the proposed system
  • 15. Implementation o Visual Studio 2008 o Microsoft .NET Framework 2.0 o C# Language o Microsoft SQL Server 2005 http://orpheus.ee.duth.gr/irs2_5/
  • 16. Evaluation o Precision and the Recall metrics o 30 searches in 100 document images o Font Query: Arial  Mean Precision: 87.8%  Mean Recall: 99.26%
  • 17. FineReader® 9.0 OCR Program Query Font Name “Tahoma”.  Mean Precision: 76.67%  Mean Recall: 58.42%  Mean Precision: 89.44%  Mean Recall: 88.05%
  • 18.  The query word is given in text and then transformed to word image  The proposed system extract nine (9) powerful features for the description of the word images  These features describe satisfactorily the shape of the words while at the same moment they suppress small differences due to noise, size and type of fonts  Based on our experiments the proposed system performs better in the same database than a commercial OCR package