Presentation of the digitisation works with historical languages performed by the KU Leuven during the Impact Centre of Competence Annual General Meeting
2. Intro: KU Leuven Digitisation
• University Library Central Services
• Digitisation projects and programmes
o Research, education, heritage
o Coordination, facilitation
• Imaging Lab
o Focus on quality
o Focus on innovation
3. Intro: LIBIS
• IT solutions for collection management
o Archives, libraries, musea
o Development and support for network larger than just KU Leuven
o LIAS
• Solutions for researchers
o Scientific data management, collaboration, sharing
o Multiple environments
• Centre of expertise
• Project oriented
4. Lines of Work and Issues
• Output formats
• Historical languages: Latin
• Historical languages: Demotic and friends
• Printed statistical tables
• Manuscripts and handwritten materials
• Workflow management
5. Output formats
• SUCCEED
• OCR engines generate TEI that does not use all features of the standard.
• Reduces the value of OCR-generated TEI as a starting point for research.
• Looking for:
o A way to improve the quality of TEI generated by OCR engines
• Possible input:
o LIBIS expertise and knowhow
6. Historical languages: Latin
• Course notes by students of the old university of Leuven
• Western Europe: Latin essential for historical research
• Fragmented efforts, hard to track, difficult to establish cooperation
• Looking for:
o Highly automated and accurate OCR = limited manual intervention
o Lexica, NER
• Possible input:
o Text material from different periods and locations
o Academic input: neo-latin, …
7.
8. Historical languages: other
• Latin is not the only important historical language
• Precursors of contemporary spoken languages
• No specific projects for now
• Certainly important for our researchers, Hebrew for instance
• Looking for :
o Initiatives we might join
9. Printed statistical tables
• Recensement général des industries et des métiers (31 octobre 1896)
• Nineteenth-century statistical material
• Very hard to use for research due to sheer size and complexity
• Solution: digitisation followed by OCR
• Output: spreadsheets or functional equivalents
• Looking for:
o Extremely accurate OCR for numeric materials
o correct translation of dense table layout
o Tools for preparation of the digitised images and quality control
• Possible input:
o Digitized source material
o Expertise:Depts of Electrical Engineering, Economic History, Historical Demography
10. How to deal with complex layout, columns and ciphers?
11. Manuscripts and handwritten material
• RICH + Bible of Anjou
• Ready to contribute material as content holder
• Working on a programme about letters
12. Workflow management
• Digicorder + Teamwork
• How do others deal with workflow management?
• Where to position enrichment in digitisation workflow?
• Ready to participate in the production of Webinars
13. Klik op het pictogram als u een afbeelding wilt toevoegen
Digicorder = tool to manage naming of projects and scans
Created by Diederik Lanoye using Filemaker
14. Options when creating unique names for scans and corresponding labels
Starting point = object to be digitized
Label = description of part of object or number of page or folio
19. Milestones are defined for important moments in the workflow
Often in case of transitions
More information: https://www.teamwork.com/projects/
20. You never walk alone
o Issues are not specific to KU Leuven
o Sharing expertise to cover all aspects is the only way to go
o Valuable expertise in specific fields
• Neo and humanist Latin
• Historic demography and Economic history
• Imaging
o On our wishlist:
• Cooperation in new and on-going developments
• Exchange of expertise
• Above all: action
21. Cooperation
• Wiki as a starting point, interesting initiative
• Who wants to join forces?
o Writing projects together
o Searching for funding
• Important:
o Automated
o Accurate
o Scalable and Maintainable
o Cost effective
• digitalisering@bib.kuleuven.be
• Hoping to return to Leuven with names, specific suggestions, and appointments for meetings to
discuss proposals
22. Appendix: Center for Processing Speech and Images
• The Center for Processing Speech and Images (PSI) is one of the units within
the department of Electrical Engineering (ESAT) at KU Leuven. It is specialized
in computer vision and has object and object class recognition as one of its
most important domains of research. Besides more general goals as scene
understanding, segmentation or invariant object recognition, it has experience
with character recognition in licence plates and automatic recognition of
handwritten music scores for transcription to modern music.
With more than 60 researchers it is one of the biggest research groups of its
kind in Europe and has a lot of experience in national and international projects.
2 professors have received ERC grants of the European Commission and have
won several other prestigious prizes.