SlideShare a Scribd company logo
1 of 30
Download to read offline
OCRFeeder

static void
_f_do_barnacle_install_properties(GObjectClass
*gobject_class)
{
GParamSpec *pspec;
/* Party code attribute */
pspec = g_param_spec_uint64
(F_DO_BARNACLE_CODE,
"Barnacle code.",
"Barnacle code",
0,
G_MAXUINT64,
G_MAXUINT64 /*
default value */,
G_PARAM_READABLE
| G_PARAM_WRITABLE |
G_PARAM_PRIVATE);

OCR Made Easy on GNOME

g_object_class_install_property (gobject_class,
F_DO_BARNACLE_PROP_CODE,

Joaquim Rocha
jrocha@igalia.com

July 27 2012
What is it?
Document Analysis and Optical
Character Recognition
for GNOME

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Why?
Paper has a number of problems
No applications for GNU/Linux to do
a fair job

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Paper problems:
Security

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/badwsky/
Paper problems:
Preservation

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/98469445@N00/
Paper problems:
Data processing

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/hugovk/
Paper problems:
Ecology

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/pranavsingh/
Paper problems:
Accessibility

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/illustrator/
No fair conversion apps for
GNU/Linux
apart from OCR engines, but...

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
OCR != Document Conversion
(it only deals with chars)
(does not consider the layout)
(does not distinguish contents)

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
What's needed is
Document Analysis and
Recognition
(conversion of documents to an
electronic format)
(first projects in the 80s)
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
How it works

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
So many layouts...

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

CC Photo by: http://www.flickr.com/photos/uber-tuber/
Layouts vary with the type of
document
What works on detecting one, won't
work on others

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
OCRFeeder focuses on contents,
not on layouts!

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Key concept:
If a document image can be
divided in windows of 1 (content)
or 0 (not content),
then it is possible to group all the
1s and outline the contents
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Recognition:
System-wide OCR engines are used
Engines are configured from the
GUI or XML files

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Most known free OCR engines are
detected and configured
automatically:
* Tesseract
* GOCR
* OCRAD
* Cuneiform
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Exportation formats:
ODT
HTML
Plain text
PDF

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
User interaction:
Users can edit everything
and review the algorithm's results
So, UI can work in attended and
unattended ways
CLI only works in an unattended
mode
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Demo time!

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Other features:
* PDF importation
* Unpaper preprocessor
* Font style edition
* Image deskewing
* OCR results cleaning
* Project saving/loading
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Future:
* More exportation formats: HOCR,
etc.
* Make OCR engines' management
easier
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Webpage:
http://live.gnome.org/OCRFeeder
git:
http://git.gnome.org/ocrfeeder
Bugzilla:
http://bugzilla.gnome.org
product: OCRFeeder

Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012
Thank you!
Joaquim Rocha (Igalia) · OCRFeeder · GUADEC 2012

More Related Content

Similar to OCRFeeder - OCR made easy on GNOME (GUADEC 2012)

Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
Grilo: Integrating Multimedia Content in Applications (ELCE 2010)Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
Igalia
 
Introduction google glass en - rev 20 - codemotion
Introduction google glass   en - rev 20 - codemotionIntroduction google glass   en - rev 20 - codemotion
Introduction google glass en - rev 20 - codemotion
Codemotion
 

Similar to OCRFeeder - OCR made easy on GNOME (GUADEC 2012) (20)

Grilo: Integration of Multimedia Contents in Applications Made Easy (FOSDEM 2...
Grilo: Integration of Multimedia Contents in Applications Made Easy (FOSDEM 2...Grilo: Integration of Multimedia Contents in Applications Made Easy (FOSDEM 2...
Grilo: Integration of Multimedia Contents in Applications Made Easy (FOSDEM 2...
 
Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
Grilo: Integrating Multimedia Content in Applications (ELCE 2010)Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
Grilo: Integrating Multimedia Content in Applications (ELCE 2010)
 
Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2Distributed Resource Management Application API (DRMAA) Version 2
Distributed Resource Management Application API (DRMAA) Version 2
 
Mender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and GolangMender.io | Develop embedded applications faster | Comparing C and Golang
Mender.io | Develop embedded applications faster | Comparing C and Golang
 
Cross-Platform Mobile Development with Ionic Framework and Angular
Cross-Platform Mobile Development with Ionic Framework and AngularCross-Platform Mobile Development with Ionic Framework and Angular
Cross-Platform Mobile Development with Ionic Framework and Angular
 
BSidesROC 2016 - Jaime Geiger - Android Application Function Hooking With Xposed
BSidesROC 2016 - Jaime Geiger - Android Application Function Hooking With XposedBSidesROC 2016 - Jaime Geiger - Android Application Function Hooking With Xposed
BSidesROC 2016 - Jaime Geiger - Android Application Function Hooking With Xposed
 
Shaping the Future of Automatic Programming
Shaping the Future of Automatic ProgrammingShaping the Future of Automatic Programming
Shaping the Future of Automatic Programming
 
Python debugging techniques
Python debugging techniquesPython debugging techniques
Python debugging techniques
 
Post-Mortem Debugging and Web Development
Post-Mortem Debugging and Web DevelopmentPost-Mortem Debugging and Web Development
Post-Mortem Debugging and Web Development
 
ogr2gui - english version
ogr2gui - english versionogr2gui - english version
ogr2gui - english version
 
Introduction google glass en - rev 20 - codemotion
Introduction google glass   en - rev 20 - codemotionIntroduction google glass   en - rev 20 - codemotion
Introduction google glass en - rev 20 - codemotion
 
Android Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognitionAndroid Things Robocar with TensorFlow for object recognition
Android Things Robocar with TensorFlow for object recognition
 
Classification - 2023-03-25
Classification - 2023-03-25Classification - 2023-03-25
Classification - 2023-03-25
 
Ext GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case StudyExt GWT - Overview and Implementation Case Study
Ext GWT - Overview and Implementation Case Study
 
How to sell drupal 8
How to sell drupal 8How to sell drupal 8
How to sell drupal 8
 
PyConUK 2014 - PostMortem Debugging and Web Development Updated
PyConUK 2014 - PostMortem Debugging and Web Development UpdatedPyConUK 2014 - PostMortem Debugging and Web Development Updated
PyConUK 2014 - PostMortem Debugging and Web Development Updated
 
Secure Code Review 101
Secure Code Review 101Secure Code Review 101
Secure Code Review 101
 
Grilo
GriloGrilo
Grilo
 
BIM/GIS Integration: A Practical Approach in Real Cases
BIM/GIS Integration: A Practical Approach in Real CasesBIM/GIS Integration: A Practical Approach in Real Cases
BIM/GIS Integration: A Practical Approach in Real Cases
 
The Chromium/Wayland project (Web Engines Hackfest 2017)
The Chromium/Wayland project (Web Engines Hackfest 2017)The Chromium/Wayland project (Web Engines Hackfest 2017)
The Chromium/Wayland project (Web Engines Hackfest 2017)
 

More from Igalia

Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
Igalia
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
Igalia
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
Igalia
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Igalia
 

More from Igalia (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Building End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPEBuilding End-user Applications on Embedded Devices with WPE
Building End-user Applications on Embedded Devices with WPE
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded DevicesAutomated Testing for Web-based Systems on Embedded Devices
Automated Testing for Web-based Systems on Embedded Devices
 
Embedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to MaintenanceEmbedding WPE WebKit - from Bring-up to Maintenance
Embedding WPE WebKit - from Bring-up to Maintenance
 
Optimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdfOptimizing Scheduler for Linux Gaming.pdf
Optimizing Scheduler for Linux Gaming.pdf
 
Running JS via WASM faster with JIT
Running JS via WASM      faster with JITRunning JS via WASM      faster with JIT
Running JS via WASM faster with JIT
 
To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!To crash or not to crash: if you do, at least recover fast!
To crash or not to crash: if you do, at least recover fast!
 
Implementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamerImplementing a Vulkan Video Encoder From Mesa to GStreamer
Implementing a Vulkan Video Encoder From Mesa to GStreamer
 
8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa8 Years of Open Drivers, including the State of Vulkan in Mesa
8 Years of Open Drivers, including the State of Vulkan in Mesa
 
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por IgaliaIntroducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
Introducción a Mesa. Caso específico dos dispositivos Raspberry Pi por Igalia
 
2023 in Chimera Linux
2023 in Chimera                    Linux2023 in Chimera                    Linux
2023 in Chimera Linux
 
Building a Linux distro with LLVM
Building a Linux distro        with LLVMBuilding a Linux distro        with LLVM
Building a Linux distro with LLVM
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Graphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devicesGraphics stack updates for Raspberry Pi devices
Graphics stack updates for Raspberry Pi devices
 
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOSDelegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
Delegated Compositing - Utilizing Wayland Protocols for Chromium on ChromeOS
 
MessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the webMessageFormat: The future of i18n on the web
MessageFormat: The future of i18n on the web
 
Replacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shadersReplacing the geometry pipeline with mesh shaders
Replacing the geometry pipeline with mesh shaders
 
I'm not an AMD expert, but...
I'm not an AMD expert, but...I'm not an AMD expert, but...
I'm not an AMD expert, but...
 
Status of Vulkan on Raspberry
Status of Vulkan on RaspberryStatus of Vulkan on Raspberry
Status of Vulkan on Raspberry
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

OCRFeeder - OCR made easy on GNOME (GUADEC 2012)