The document discusses the development of a metadata model for digitized newspaper articles. It aims to gather existing metadata models, design a comprehensive new model called ENMAP based on standards like METS and MODS, and manage feedback on the format. The model will include a data dictionary defining structural elements and text types found in newspapers. Elements may include titles, headlines, advertisements, illustrations, and page numbers. Text types could be breaking news, reviews, obituaries, advertisements, weather forecasts, and more. The objectives are to provide clear definitions and examples to help libraries apply the metadata and tools can use it for search and crowd-based services. Feedback is sought on defining elements and how they interact with readers.
2. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
WP 5 Metadata
•The main objectives are
• To gather and analyse metadata models from libraries currently
in use for the digitisation of newspapers.
• To design and release a comprehensive metadata model based
on de-facto standards such as METS, MODS, MARC, ALTO, etc.
• To manage the feedback cycles where stakeholders will comment
on the format
• To prepare an online resource (Wiki, database or similar
website) that contains the rules how to apply the format and how
to use it within a digitisation project
•ENMAP (Europeana Newspaper Mets Alto Profile)
• Already used within the project – standardization aspect
• Public release in October 2013
2
3. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Structural Metadata
•Idea of a data dictionary
• Addition to ENMAP
• What is what? Definition of structural elements / text types
•Structural elements
• Title section
• Headline
• Advertisement
• Illustration
• Caption line
• Running title (column title)
• Page number
• ...
3
4. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Text types
•Text types (or sub-genres)
• breaking news
• short news
• book review, theatre review, software review,...
• obituary
• advertisement
• family notice
• job announcement
• wheater forecast
• novel, poem, ...
• etc.
4
5. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Objectives
•Set up a data dictionary
• Provide a comprehensive list of elements and text types
• Use clear definitions
• Make the criteria for definitions transparent
• Include many examples from several newspapers
• Make it an open dictionary, so that people can contribute
• Classify structural elements according to their intention
•Rationale
• Many libraries need to define these elements for
• service providers
• search services (facetted search)
• crowd based services (apply these metadata)
• Currently no other standard is available (partly TEI)
5
6. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Some considerations to take home...
• Understand text as an interaction towards the reader:
• A text may want to inform, entertain, convince, activate, support,
etc. users. What are the main interactions in (historical)
newspapers?
• Does the layout define the interaction or the semantic content or a
combination of both?
• Are family notices, obituaries, cross-word puzzles, poems, novels,
etc. articles or (intellectual) items?
• Is the headline of an article a piece of information, or does it
support the user in navigating through a newspaper?
• Imagine a crowd based service where users can apply text types
from a list.
6
7. This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community
http://ec.europa.eu/ict_psp
Want to contribute?
•Send me your lists of structural elements and how you
defined them!
•For project partners: Have a look to the updated version of
the paper on structural metadata: WP5/documents/structural
MD
•Do not hesitate to take part in the discussion!
7
8. Thank you for your attention!
lGünter Mühlberger
<guenter.muehlberger@uibk.ac.at>