Slides of the presentation of the paper Document Representation Refinement for Precise Region Description by Christian Clausner, Stefan Pletschacher and Apostolos Antonacopoulos. #digidays
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Datech2014-Session1-Document Representation Refinement for Precise Region Description
1. Document Representation Refinement
for Precise Region Description
Christian Clausner, Stefan Pletschacher and
Apostolos Antonacopoulos
PRImA Lab, School of Computing, Science and
Engineering, University of Salford,
United Kingdom
2. Document Page Regions
DATeCH 2014 2
Segmentation,
Classification
• Region (block, zone): Connected area of a
document image with content of a single
specific type
• Examples: Text, graphic, table
3. Region Representation
• By geometric objects
– Bounding box
– Stack of rectangles
– Polygon
• By pixels
– Bitmap
– Run-length encoding
DATeCH 2014 3
4. Need for Precise Region Descriptions
• Precise description is crucial for all but the most
trivial document analysis and recognition
applications
• For performance evaluation:
The loss of quality introduced
by imprecise regions can be
bigger than the variation of
accuracy of the actual
recognition method
DATeCH 2014 4
5. The Situation
• Trend to more precise descriptions, but…
• Output of state-of-the-artOCR systems:
– Stacks of rectangles (ABBYY FineReader Engine 11)
– Bounding boxes (Tesseract OCR 3.02)
• Popular formats for layout analysis and OCR results:
– ALTO XML (boxes, ellipses, polygons (region level only))
– FineReader XML (stacks of rectangles (region level only))
– PAGE XML (polygons for all levels)
– HOCR (boxes)
DATeCH 2014 5
6. Refinement through Polygonal Fitting
• Applicable to regions that
have child objects in the
document model
• A typical object hierarchy
contains regions, text lines,
words and glyphs (characters)
• Idea: Tightly wrap a polygon
around the child objects
DATeCH 2014 6
7. Polygonal Fitting Approach
1. Create bitmasks for the child
objects and transfer them to an
empty bitmap
2. Fill the gaps between the child
objects by a smearing approach
3. Optional: Exclude neighbour
regions
4. Trace the contour of the
foreground and create a polygon
DATeCH 2014 7
8. 1 - Transferring Child Object to Bitmap
• Starting point: Polygonal object (e.g. text line,
word, or glyph)
• Lossless conversion to rectangle based interval
representation
• Transferring the rectangles to the target bitmap
DATeCH 2014 8
9. 2 – Smearing Approach
• Goal: Connect all foreground
components in the bitmap by
filling the gaps in-between
1. Alternatingly fill horizontal and
vertical gaps if they are smaller
than a dynamic threshold
(threshold is increased after
each iteration)
2. If necessary, use diagonal
smearing to connect remaining
components
DATeCH 2014 9
10. 3 – Subtraction of Neighbours
• Optional step to avoid
overlap with adjacent
regions
• Simply erase the
corresponding pixels from
the created bitmap
DATeCH 2014 10
11. 4 – Outline Tracing
• Trace the contour of the
foreground component
in the created bitmap
• Create polygon on-the-
fly by adding points for
each change of direction
(corner)
DATeCH 2014 11
12. Experiments
• Carried out on a dataset
of contemporary
documents consisting of
scanned magazine and
technical article pages
• Processed with Tesseract
OCR 3.02 (open source)
• Exported to PAGE XML
with and without
refinement
DATeCH 2014 12
14. Results
• Measurement of region overlaps (number and
area)
DATeCH 2014 14
Overlapping
Regions
Overlap Area
(Megapixel)
Original
Outlines
621 (45.8%) 19.9
Refined
Outlines
286 (21.1%) 2.5
15. Impact on Performance Evaluation
• Real-world scenario
• Measure the performance of Tesseract OCR engine
• Evaluation metrics of previous ICDAR page
segmentation competitions
DATeCH 2014 15
Average success rate using originaloutlines 81.1%
Average success rate using refined outlines 84.5%
Average improvementfor all documents 3.4%
Maximumimprovement 22.9%
16. Conclusion
• Existing geometric region data can be significantly refined by fitting
precise polygons around child objects
• Validity and impact on real-world scenarios has been shown
• Refinement in performance evaluation helps to eliminate problems
that arise from insufficient geometric descriptions → Concentrate
on real issues of OCR methods
• Positive effect on accuracy of presentation/repurposing systems
(highlighting, cropping, article tracking, etc.)
• Approach used in Aletheia ground truth editor and result viewer
(primaresearch.org/tools)
DATeCH 2014 16