Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Care henk vd Heuvel
1. Aim of project
CARE: Curation of Dutch Regional Dialect Dictionaries
Nicoline van der Sijs, Henk van den Heuvel,
Roeland van Hout, Eric Sanders
CLS/CLST, Radboud University Nijmegen, The Netherlands
•OCR version of PDF files (WBD & WLD, Parts I
and II
• Formerly curated TSV files for WBD & WLD, Part
III
• FP5 files of WGD
What we deliver
• Generic LMF model for dialect dictionaries
• WBD, WLD as CSV files and LMF files
• For at least 32 of 42 books of Parts I and II
• For all 28 books of Part III
• Original PDFs of books
• CMDI files per Part
• Curation Reports
Where we start
The CARE project is funded by CLARIN-NL under grant number 15-004
• Definition of a generic database structure for
dialect dictionaries (LMF)
• Link the structure to Woordenboek van de
Vlaamse Dialecten (WVD) and other regional
dictionaries
• Curation of Woordenboek van de Brabantse
dialecten (WBD) and Woordenboek van de
Limburgse Dialecten (WLD) parts I and II
• Update curation of WBD and WLD Part III
• Include Woordenboek van de Gelderse Dialecten
(WGD)
Generic aspects
• LMF model suited for all sorts of dialect
dictionaries
• CMDI metadata profile
• Very flexible LMF conversion script
PDF book
CLARIN Data
Centre
LMF files
CSV files
CMDI files
CLARIN Data Centre:
Meertens Institute
• Adding Persistent Identifiers
• Storage
CMDI
-Metadata profile includes:
-Link to LMF
LMF script
-Converts CSV file into LMF
CSV script
-Converts typographed text file into
CSV file by:
-Typographic & text cleaning
- Categorization of information based on
typography
-Recoding dialect forms
-Checking and expanding Kloekecodes
-Logfile is used for iterative manual
correction
Manual Preprocesing by trained
assistents, greatly acknowledged:
Aukje Borkent, Maaike Borst, Eline
Dimmendaal, Jorik van Engeland
and Inge Otto
- Addition of typographic codes for
Comments (“Toelichting”) in text file
- Correcting script errors