IMPACT Final Event 26-06-2012 - The Functional Extension Parser (FEP) and Ebooks On Demand (EOD) by Andreas Parschalk (University of Innsbruck)
1. eBooks on Demand and FEP
Andreas Parschalk, University of Innsbruck
(UIBK), Library
andreas.parschalk@uibk.ac.at
2. Overview
EOD – the service
Overview
Libraries workflow
End-users view
EOD and the Functional Extension Parser
The Functional Extension Parser (FEP)
Integration into the workflow
Current status
3. EOD – the service
What is EOD?
Network of libraries
Digitisation on demand for copyright-free
books
Started 2006 co-founded by the EC in
eTEN program
Delivering digitised books since 2007
4. EOD – the service
Incorporation
into Digital
Library &
Europeana
EOD button:
digitising this
book on request
Library: scans
& transfers
images
7. EOD libraries
University Libraries of Innsbruck, Graz and Vienna (2x),
Austria
Vienna City Library
Bavarian State Library (Munich), University Libraries of
Germany Regensburg, Greifswald, Berlin (Humboldt University),
Saxon State Library (Dresden), STABI Berlin
Denmark Royal Library
Estonia National Library, University Library of Tartu
France Academic health library (Paris)
Hungary National Széchényi Library of Hungary, Library of the
Hungarian Academy of Science
Portugal National Library
Slovakia University Library of Bratislava, Slovak Academy of Sciences
Slovenia National and University Library
Sweden University Library of Umeå, National Library of Sweden
Switzerland National Library of Switzerland, Library at Guisanplatz
8. EOD – the service
What is being digitised
Only public domain books according to
laws and regulations of the libraries'
country
Aim: „Full informational capture“
Whole books cover to cover
Virtually counted blank pages
Supplements (maps, tables, …) that form
an integral part of the document
9. EOD: The Libraries‘ point of view
Central services used by libraries
Web application for the administration of orders
and generation of eBooks
Automation of communication (automated e-mails
to end-users, tracking page with status update)
OCR (optical character recognition) services:
antiqua and gothic font
NEW: Structural Analysis (FEP)
Delivery of CD-ROMs (optional)
Preprint preparation for reprint orders (optional)
Reprint creation and delivery
Central management of credit card payments
10. Carried out locally at library sites
Scanning and uploading of material
Handling orders in Order Data Manager
Uploading to local digital repositories
Long term storage
11. EOD: The Libraries‘ point of view
Workflow for the libraries
Order arrives
Order the book in the library
Check the order details (can it be digitised,
correct automatically fetched metadata)
Scan book cover to cover
Upload the images
Start eBook generation
Check results and finish the order
13. EOD: The Libraries‘ point of view
Ebook generation
Configuring settings
Resolution and jpeg quality
With or without OCR
OCR settings (language, font type)
Deskew despeckle
Start eBook generation
Create EOD cover pages
Alternatively generate eBook locally
14.
15. EOD: The Libraries‘ point of view
The library can download the OCR output
as zipped single pages xml and as RTF
Use in local repository (e.g. full text search)
Digitisation for the visually impaired
Possible full text correction
Conversion to other formats (e.g. ePub)
No structural information. Requests for
METS/ALTO output until now
17. The end-users point of view
Find the record of the book in catalogue
Click EOD button
Fill out orderform
1-2 weeks delivery time depending on the
library
Pay online
Download and use
18. The end-users point of view
The catalogue situation diverse and dispersed
OPACs
Digitised card catalogues
Union catalogues
The EOD search engine
In addition to the EOD button in the libraries'
catalogues
http://search.books2ebooks.eu
3 million records of digitisable and digitised items
19 EOD libraries already integrated their records
23. EOD and the FEP
Motivation
Improve output for libraries
Structural information
METS/ALTO
Improve output for end-users Enhance
PDF with clickable TOC
24. EOD and the FEP
Prerequisites
XML output of OCR of complete document
Images of the scanned document
Coordinates of the OCR xml must correspond with the coordinates in
the images (deskew images before)
Quality of the scans and OCR as good as possible
FEP works with the XML output of EOD eBook generation
Automatically extracts structural information about the document
Page numbers
Table of Contents
Offers webinterface to manually correct enhance the result
25. EOD and the FEP
Integration of FEP into EOD workflow
Regular EOD eBook generation
Operators decide if FEP is possible/useful
Scan quality
OCR quality
Structure of the book
Start automatic recognition
Check/correct/modify results in FEP
webinterface
26. EOD and the FEP
Operator finds books with automatically
recognized structure in the FEP webinterface
and can then enhance/correct the recognized
printspace, pagination and TOC (optionally
also the logical structure)
27.
28.
29. EOD and the FEP
After all correction steps are done
METS/ALTO files
Enhanced PDF
If results are ok
Replace regular PDF with enhanced PDF
by uploading to ODM via FTP
End-users download enhanced PDF as
usual through their EOD trackingpage
31. EOD and the FEP
Current status
Interface OrderDataManager – FEP core
implemented and workflow adapted
Internal testing phase finished
Online and offline workshops to familiarize EOD
operators on FEP correction webinterface
were held
Ready for production environment
Betatesting and feedback period with 10
selected EOD network libraries until end of
July
32.
33. Thank you for your attention!
Andreas.Parschalk@uibk.ac.at