TEI Conference - CVCE

756 vues

Publié le

Presentation at the TEI Conference and Members' Meeting 2015, October 28-31, Lyon, France

0 commentaire
0 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Aucun téléchargement
Vues
Nombre de vues
756
Sur SlideShare
0
Issues des intégrations
0
Intégrations
2
Actions
Partages
0
Téléchargements
3
Commentaires
0
J’aime
0
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

TEI Conference - CVCE

  1. 1. Florentina Armaselu – DHLab, Centre virtuel de la connaissance sur l’Europe (CVCE), Luxembourg florentina.armaselu@cvce.eu 1 www.cvce.eu From a Small-Scale Digital Edition to a TEI Publication Framework in Modern European History Text Encoding Initiative (TEI) Conference and Members’ Meeting. Connect, Animate, Innovate. 28 to 31 Octobre 2015. Université Lumière Lyon 2
  2. 2. 1. The WEU-DIPLO pilot project 2. Transviewer, towards a TEI publication framework 3. Discussion 4. References Summary 2
  3. 3. Part I The WEU-DIPLO pilot project 3
  4. 4. 1. Goal: XML-TEI encoding, corpus analysis and Web publication of institutional documents of the W.E.U. (Western European Union): • Topics: armament production, standardization, control in the period from 1954 to 1982; • Source: Archives nationales de Luxembourg, W.E.U collection. 2. Initial format: • digitized versions (JPG) of typewritten materials (one file per page). 3. Size: *proc. = processed Overview of the WEU-DIPLO project Part I. WEU-DIPLO pilot 4 Category Number of documents Number of documents per language Number of pages Number of pages per language EN FR FR proc.* EN FR FR proc.* Note 89 43 46 37 395 191 204 155 Minutes 30 15 15 15 256 138 118 118 Memorandum 3 1 2 2 16 7 9 9 Study 2 0 2 1 12 0 12 8 Discourse 1 0 1 0 4 0 4 0 Draft protocol 2 1 1 0 4 2 2 0 Total 127 60 67 55 687 338 349 290
  5. 5. Overview of the WEU-DIPLO project: workflow Part I. WEU-DIPLO pilot 5
  6. 6. Overview of the WEU-DIPLO project: page structure. ©WEU-UEO Part I. WEU-DIPLO pilot 6 Header Content Footer
  7. 7. Microsoft Word Styling – WEU-DIPLO Part I. WEU-DIPLO pilot 7 Headers, footers Headings, line breaks, paragraphs
  8. 8. Conversion and enrichment (XSLT, manual, NER) Part I. WEU-DIPLO pilot 8 OxGarage (DOCX to TEI P5) oXygen XML Editor • XSLT transformation (metadata, structure); • manual enrichment (semantics – discourse of country/institutional representatives) GATE (Name Entity Recognition) • training phase (Gazetteer List Collector) • annotation phase (names of persons, organisations, places, functions, events, products; dates) oXygen XML Editor • XSLT (GATE XML to TEI P5 transformation)
  9. 9. XML-TEI Encoding: WEU-DIPLO - metadata; layout (header). ©WEU-UEO Part I. WEU-DIPLO pilot 9 @@hAuthor @@hArchNum @@hStampConfid @@hDocRef @@hOrigDate @@hOrigLang @@hVersion
  10. 10. XML-TEI Encoding: WEU-DIPLO – Structure (headings, paragraphs, line breaks); semantics (named entities, discourse). ©WEU-UEO Part I. WEU-DIPLO pilot 10 @@Heading2@@Paragraph @@LineBreak@@Names @@Discourse
  11. 11. XML-TEI Encoding: WEU-DIPLO – transcription features (Pierazzo, 2011) Part I. WEU-DIPLO pilot 11
  12. 12. Part II Transviewer, towards a TEI publication framework 12
  13. 13. • Treaties; official declarations and meeting reports; letters; notes; press articles; images, video and audio archives related to European integration history Context: The CVCE’s ePublications Part II. Transviewer 13
  14. 14. 1. Transviewer concept: • XML-TEI transformation/visualisation on the fly, in the browser • flexible framework for the publication of XML-TEI documents in European integration history; 2. Technologies : • XML, HTML, XSLT, CSS and JavaScript 3. Tested platforms: • EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/ • KILN : http://kiln.readthedocs.org/en/latest/# • TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/ • Versioning Machine: http://v-machine.org/ • XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/ Transviewer overview Part II. Transviewer 14
  15. 15. Implementation (adaptation and in-house development): • side-by-side view digital facsimile and transcription (EVT model) • third-party libraries: o BookReader: tool designed to provide online access to scanned books o Saxon-CE: support for XSLT 2.0 transformation in the browser o in-house development (configuration, frames and buttons layout/actions, transcription rendering, third-party libraries calls) Transviewer prototype Part II. Transviewer 15
  16. 16. Transviewer experiments– digital facsimile/transcription side-by-side view. ©WEU-UEO Part II. Transviewer 16
  17. 17. Transviewer experiments– digital facsimile/transcription side-by-side view. Werner – handwritten notes Part II. Transviewer 17
  18. 18. Transviewer experiments (simulation) – video/audio and transcription synchronisation. Werner - interviews Part II. Transviewer 18
  19. 19. Transviewer features – panels layouts Part II. Transviewer 19
  20. 20. Transviewer features– transcription format Part II. Transviewer 20
  21. 21. Transviewer features– panels interlinking Part II. Transviewer 21
  22. 22. Part III Discussion 22
  23. 23. “By teaching an edition how to swim, I mean endowing an edition not only with a store of factual knowledge concerning the work presented, but also with the capability of dealing gracefully with the mutability of the electronic medium, by exploiting the possibilities for reader-controlled changes to the edition’s presentation and by adapting successfully to rapid changes in the hardware and software environment.” (Sperberg-McQueen, 2009) 1. Transviewer prototype questions: • flexible enough to support different types of documents in European integration history and different user requirements; • modular architecture to allow gradual development and customisation according to the needs of the projects; • balance manual interventions/automatic processing (XSLT, NER); • XML transformation on the fly (no need for intermediary formats/steps, changes to the XML already part of the publication). Discussion Part III. Discussion 23
  24. 24. 3. Issues: • BookReader – use of an older version of jQuery library; • non-uniform support of Saxon-CE for XSLT 2.0 transformation in the browsers; • need for batch conversion to XML-TEI (potential adaptation of OxGarage for batch processing). 4. Ongoing/future work for further development: • evaluation (technology – technical experts; usability tests – experts in European integration studies); • development of new modules (multi-panels, audio/video transcription, etc.) and tests with more project samples; • integration into the existing CVCE’s Website architecture: o Back End; o Front End. Discussion Part III. Discussion 24
  25. 25. Thank you! Discussion 25 Scaling in a publication framework would imply not only teaching your editions “how to swim” but also how to swim together.
  26. 26. • Book Reader: https://openlibrary.org/dev/docs/bookreader • EVT (Edition Visualization Technology): http://sourceforge.net/projects/evt-project/ • GATE: https://gate.ac.uk/ • KILN : http://kiln.readthedocs.org/en/latest/# • OxGarage: http://www.tei-c.org/oxgarage/ • Pierazzo, Elena. (2011). A rationale of digital documentary editions. In LLC. The Journal of Digital Scholarship in the Humanities, Vol. 26, No. 4, December 2011, pp. 463-477. • http://www.scholarlyediting.org/2014/essays/essay.pierazzo.html. • TEIBoilerplate : http://dcl.ils.indiana.edu/teibp/ • TEI (Text Encoding Initiative): http://www.tei-c.org • Versioning Machine: http://v-machine.org/ • Saxon-CE: http://www.saxonica.com/ce/user-doc/1.1/index.html • Sperberg-McQueen, C.M. 2009. “How to teach your edition how to swim”. In LLC. The Journal of Digital Scholarship in the Humanities. Volume 24, No. 1, April 2009. Oxford Journals. • XTF (eXtensible Text Framework): http://xtf.cdlib.org/about/ References 26

×