SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Uwe Springmann1
, Dietmar Najock2
, Hermann Morgenroth2
,
Helmut Schmid1
, Annette Gotscharek1
and Florian Fink1
OCR of Historical Printings
of Latin Texts
Problems, Prospects, Progress
1
CIS, Ludwig-Maximilans-Universität München
2
Institute for Greek and Latin
Languages and Literatures,
Freie Universität Berlin
p. 2 (16)OCR of historical printings of Latin textsSpringmann et al.
Overview
●
Why Latin?
●
Problems
●
Prospects
●
Progress
p. 3 (16)OCR of historical printings of Latin textsSpringmann et al.
Why Latin?
●
huge heritage: largest body of historical literary sources
●
Latin publications dominate print production until about 1750
●
many titles have never been reprinted
●
either key or barrier to cultural heritage of the western world
●
has been left out of the IMPACT project despite its importance
p. 4 (16)OCR of historical printings of Latin textsSpringmann et al.
Some problems for OCR engines
historical fonts
long s ( )ſ
historical ligatures:
Æ, æ, Œ, œ, st, 
polytonic Greek words
diacritics
abbreviations
historical spellings
Problems
p. 5 (16)OCR of historical printings of Latin textsSpringmann et al.
Some problems for OCR engines (continued)
●
historical typography and spelling are also a problem for early modern
languages
●
ambiguities of abbreviations (especially in incunabula) will not immediately
lead to fully expanded, machine readable text
●
but discretionary diacritics are helpful in POS/morphology disambiguation:
–
adverb/vocative: altè/alte
–
adverb/pronoun: quàm/quam
–
conjunction/preposition: cùm/cum
–
ablative/nominative: hastâ/hasta
Problems
p. 6 (16)OCR of historical printings of Latin textsSpringmann et al.
State of the art – example pages
Prospects
1544
1779
1649
p. 7 (16)OCR of historical printings of Latin textsSpringmann et al.
State of the art – results for example pages
Prospects
Year Abbyy FR 11.1 Tesseract 3.03 OCRopus 0.7
1544 83,14 70,32 74,59
1649 88,07 84,87 78,98
1779 82,13 80,77 75,46
character accuracy in %
out-of-the-box performance, no language model (or default = English)
OCRopus hampered by bad image-text segmentation
p. 8 (16)OCR of historical printings of Latin textsSpringmann et al.
Prospects
Overcoming the obstacles
●
Training (Tesseract, OCRopus)
–
(a) generate pseudo-historical images from existing texts and
historical-looking computer fonts (add some degradation to the image)
–
(b) transcribe some real pages and train on true historical fonts
●
Lexical resources (Tesseract) in recognition
●
Post-processing
–
correct OCR errors, not historical spelling (might be interesting itself)
–
add annotation: expand abbreviations, ligatures, normalize spelling
–
helpful: language model, lexicon of historical word forms
p. 10 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Postcorrection: Open-Source-Tool PoCoTo
(see paper of Vobl et al. - presentation by Christoph Ringlstetter)
p. 11 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Training on historical fonts (artificial images)
Example: Pontanus, Progymnasmata Latinitatis (1589)
p. 12 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Training on fonts, ideal lexicon
Example: Pontanus, Progymnasmata Latinitatis (1589)
character accuracy in %
Page
Abbyy
FR 11.1
Tesseract
3.03
Ocropus
0.7
Tesseract
(font)
Tesseract
(font + lex.)
Ocropus
(font)
15 87,79 80,88 80,70 91,02 93,90 92,55
16 82,94 77,41 76,94 80,12 85,65 80,47
17 85,25 75,98 86,07 85,41 91,56 93,93
18 85,93 79,51 85,53 88,29 92,68 89,67
19 87,94 80,09 79,09 86,06 90,15 87,83
OCRopus: no language model!
red: accuracy better than Abbyy
p. 14 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Training on historical fonts (real images)
Example: Thanner, Petronij Arbitri Sathyra (1500)
character accuracy in %
Page
Tesseract
3.03
Ocropus
0.7
Ocropus
(trained)
13 41,59 44,59 93,15
14 52,38 57,77 94,61
15 53,09 62,38 95,17
16 59,09 61,45 93,27
page 1-12: training set; page 13-16: test set
p. 15 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Summary
●
very old printings are hard to OCR out-of-the box
●
Tesseract and OCRopus can be trained to results above ABBYY
●
applying lexica as well as font training helps a lot
●
OCRopus can be trained to accuracies > 90%, but must at present be
combined with good line segmentation in a preprocessing step
●
postcorrection will do the rest
p. 16 (16)OCR of historical printings of Latin textsSpringmann et al.
Progress
Thank you for your interest!

Contenu connexe

Plus de IMPACT Centre of Competence

Advanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesAdvanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesIMPACT Centre of Competence
 
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...IMPACT Centre of Competence
 
Digitisation at KU Leuven University Libraries: Towards consolidation
Digitisation at KU Leuven University Libraries: Towards consolidationDigitisation at KU Leuven University Libraries: Towards consolidation
Digitisation at KU Leuven University Libraries: Towards consolidationIMPACT Centre of Competence
 

Plus de IMPACT Centre of Competence (20)

Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Session5 02.tom derrick
Session5 02.tom derrickSession5 02.tom derrick
Session5 02.tom derrick
 
Session5 01.rutger vankoert
Session5 01.rutger vankoertSession5 01.rutger vankoert
Session5 01.rutger vankoert
 
Session4 04.senka drobac
Session4 04.senka drobacSession4 04.senka drobac
Session4 04.senka drobac
 
Session3 04.arnau baro
Session3 04.arnau baroSession3 04.arnau baro
Session3 04.arnau baro
 
Session3 03.christian clausner
Session3 03.christian clausnerSession3 03.christian clausner
Session3 03.christian clausner
 
Session3 02.kimmo ketunnen
Session3 02.kimmo ketunnenSession3 02.kimmo ketunnen
Session3 02.kimmo ketunnen
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
Session2 04.ashkan ashkpour
Session2 04.ashkan ashkpourSession2 04.ashkan ashkpour
Session2 04.ashkan ashkpour
 
Session2 03.juri opitz
Session2 03.juri opitzSession2 03.juri opitz
Session2 03.juri opitz
 
Session2 02.christian reul
Session2 02.christian reulSession2 02.christian reul
Session2 02.christian reul
 
Session2 01.emad mohamed
Session2 01.emad mohamedSession2 01.emad mohamed
Session2 01.emad mohamed
 
Session1 04.florian fink
Session1 04.florian finkSession1 04.florian fink
Session1 04.florian fink
 
Session1 02.anna-maria sichani
Session1 02.anna-maria sichaniSession1 02.anna-maria sichani
Session1 02.anna-maria sichani
 
Session1 01.konstantin baierer
Session1 01.konstantin baiererSession1 01.konstantin baierer
Session1 01.konstantin baierer
 
Advanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slidesAdvanced Imaging Services at KU Leuven Libraries Webinar slides
Advanced Imaging Services at KU Leuven Libraries Webinar slides
 
Xii simposi internacional noves tendencies
Xii simposi internacional noves tendenciesXii simposi internacional noves tendencies
Xii simposi internacional noves tendencies
 
Impact management report 2016
Impact management report 2016Impact management report 2016
Impact management report 2016
 
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...
DInGO: Digitise and Go! (digitisation workflows). Toolset for digitisation wo...
 
Digitisation at KU Leuven University Libraries: Towards consolidation
Digitisation at KU Leuven University Libraries: Towards consolidationDigitisation at KU Leuven University Libraries: Towards consolidation
Digitisation at KU Leuven University Libraries: Towards consolidation
 

Dernier

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Dernier (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Datech2014 - Session 4 - OCR of Historical Printings of Latin Texts: Problems, Prospects, Progress

  • 1. Uwe Springmann1 , Dietmar Najock2 , Hermann Morgenroth2 , Helmut Schmid1 , Annette Gotscharek1 and Florian Fink1 OCR of Historical Printings of Latin Texts Problems, Prospects, Progress 1 CIS, Ludwig-Maximilans-Universität München 2 Institute for Greek and Latin Languages and Literatures, Freie Universität Berlin
  • 2. p. 2 (16)OCR of historical printings of Latin textsSpringmann et al. Overview ● Why Latin? ● Problems ● Prospects ● Progress
  • 3. p. 3 (16)OCR of historical printings of Latin textsSpringmann et al. Why Latin? ● huge heritage: largest body of historical literary sources ● Latin publications dominate print production until about 1750 ● many titles have never been reprinted ● either key or barrier to cultural heritage of the western world ● has been left out of the IMPACT project despite its importance
  • 4. p. 4 (16)OCR of historical printings of Latin textsSpringmann et al. Some problems for OCR engines historical fonts long s ( )ſ historical ligatures: Æ, æ, Œ, œ, st,  polytonic Greek words diacritics abbreviations historical spellings Problems
  • 5. p. 5 (16)OCR of historical printings of Latin textsSpringmann et al. Some problems for OCR engines (continued) ● historical typography and spelling are also a problem for early modern languages ● ambiguities of abbreviations (especially in incunabula) will not immediately lead to fully expanded, machine readable text ● but discretionary diacritics are helpful in POS/morphology disambiguation: – adverb/vocative: altè/alte – adverb/pronoun: quàm/quam – conjunction/preposition: cùm/cum – ablative/nominative: hastâ/hasta Problems
  • 6. p. 6 (16)OCR of historical printings of Latin textsSpringmann et al. State of the art – example pages Prospects 1544 1779 1649
  • 7. p. 7 (16)OCR of historical printings of Latin textsSpringmann et al. State of the art – results for example pages Prospects Year Abbyy FR 11.1 Tesseract 3.03 OCRopus 0.7 1544 83,14 70,32 74,59 1649 88,07 84,87 78,98 1779 82,13 80,77 75,46 character accuracy in % out-of-the-box performance, no language model (or default = English) OCRopus hampered by bad image-text segmentation
  • 8. p. 8 (16)OCR of historical printings of Latin textsSpringmann et al. Prospects Overcoming the obstacles ● Training (Tesseract, OCRopus) – (a) generate pseudo-historical images from existing texts and historical-looking computer fonts (add some degradation to the image) – (b) transcribe some real pages and train on true historical fonts ● Lexical resources (Tesseract) in recognition ● Post-processing – correct OCR errors, not historical spelling (might be interesting itself) – add annotation: expand abbreviations, ligatures, normalize spelling – helpful: language model, lexicon of historical word forms
  • 9.
  • 10. p. 10 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Postcorrection: Open-Source-Tool PoCoTo (see paper of Vobl et al. - presentation by Christoph Ringlstetter)
  • 11. p. 11 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Training on historical fonts (artificial images) Example: Pontanus, Progymnasmata Latinitatis (1589)
  • 12. p. 12 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Training on fonts, ideal lexicon Example: Pontanus, Progymnasmata Latinitatis (1589) character accuracy in % Page Abbyy FR 11.1 Tesseract 3.03 Ocropus 0.7 Tesseract (font) Tesseract (font + lex.) Ocropus (font) 15 87,79 80,88 80,70 91,02 93,90 92,55 16 82,94 77,41 76,94 80,12 85,65 80,47 17 85,25 75,98 86,07 85,41 91,56 93,93 18 85,93 79,51 85,53 88,29 92,68 89,67 19 87,94 80,09 79,09 86,06 90,15 87,83 OCRopus: no language model! red: accuracy better than Abbyy
  • 13.
  • 14. p. 14 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Training on historical fonts (real images) Example: Thanner, Petronij Arbitri Sathyra (1500) character accuracy in % Page Tesseract 3.03 Ocropus 0.7 Ocropus (trained) 13 41,59 44,59 93,15 14 52,38 57,77 94,61 15 53,09 62,38 95,17 16 59,09 61,45 93,27 page 1-12: training set; page 13-16: test set
  • 15. p. 15 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Summary ● very old printings are hard to OCR out-of-the box ● Tesseract and OCRopus can be trained to results above ABBYY ● applying lexica as well as font training helps a lot ● OCRopus can be trained to accuracies > 90%, but must at present be combined with good line segmentation in a preprocessing step ● postcorrection will do the rest
  • 16. p. 16 (16)OCR of historical printings of Latin textsSpringmann et al. Progress Thank you for your interest!