Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Using LOD to crowdsource Dutch WW2
underground newspapers on Wikipedia
Olaf Janssen, National Library of the Netherlands
D...
http://www.4en5meiamsterdam.nl/attachment/47454
During WW2
the Dutch resistance issued many
underground newspapers.
In every shape & form…
http://www.4en5meiamsterdam.nl/...
http://resolver.kb.nl/resolve?urn=ddd:010436323 http://resolver.kb.nl/resolve?urn=ddd:010442948
http://resolver.kb.nl/reso...
…to very small, amateur, home-made,
pamphlet-like issues
After the war 1.300 newspaper titles
were collected & preserved at the NIOD …
https://commons.wikimedia.org/wiki/File:Verz...
http://opac-gonext.oclc.org:8180/DB=8/XMLPRS=Y/PPN?PPN=107123223
.. and were
described in formal library catalogues
(1.300...
In 2010 these WW2 newspapers were digitized…..
www.delpher.nl/kranten
…into
full-texts in Delpher …
(1.300 titles)
The Dutch national aggregator for historic full-texts
...
In Delpher you can read and word-search
these newspapers…
But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this n...
But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this n...
But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this n...
Big drawback of Delpher:
No contextual information
about WW2 underground newspapers
https://thejungleisneutral.files.wordp...
Where would many people go to find
contextual information about historic newspapers?
Probably Wikipedia (via Google)
Where would many people go to find
contextual information about historic newspapers?
Probably Wikipedia (via Google)
http://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
Where would many people go to find
contextual informati...
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
Information o...
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
Information o...
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
Information o...
http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
Information o...
This Wikipedia article is
a carefully chosen exception
1. Very few illegal newspapers had
their own WP articles
2. The inventory of these newspapers
on WP:NL was far from comple...
We tackled both problems!
Wikiproject
“Systematically and uniformly describe
all 1.300 Dutch underground newspapers from WW2
on Wikipedia”
tinyurl.c...
Wikiproject
“Systematically and uniformly describe
all 1.300 Dutch underground newspapers from WW2
on Wikipedia”
tinyurl.c...
https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
We badly needed contextual information about
the newspaper...
Entry 199 – De Geus; (onder studenten)
Unique ID
(within the book)
Place of publication
Newspaper Place name
Context
Raw material for
Wikipedia article!
Person names
Newspaper Persons
IDs of related students’
newspapers
This newspaper Other newspapers
We OCRed this book into PDF
+ put it online under CC-BY-SA
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
We OCRed this book into PDF
(CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
Available online (PDF, f...
We OCRed this book into PDF
(CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
Available online (PDF, f...
We OCRed this book into PDF
(CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
Available online (PDF, f...
We OCRed this book into PDF
(CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
Available online (PDF, f...
We OCRed this book into PDF
(CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
Available online (PDF, f...
Convert PDF into structured database.
Link: titles  places, persons, other titles
Link: titles  library catalogue (met...
Convert PDF into structured database.
Link: titles  places, persons, other titles
Link: titles  library catalogue (met...
VIAF
Available online (PDF, flat file)
Open license (CC-BY-SA)
---------------------------------------------------
Convert PDF ...
Summer 2016
This LOD database is unique in the Netherlands.
First time data about underground newspapers was
systematicall...
Wikiproject
“Systematically and uniformly describe
all 1.300 Dutch underground newspapers from WW2
on Wikipedia”
We have: LOD database
Using an article template we generated
1.300 uniform and interlinked Wikipedia stubs
https://c1.stat...
https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
Grey = Wikipedia article stub
Automatically generated from database
using the article template
https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
Non-grey = Wikipedia article stub
Automatically genera...
This bit
was added manually
to expand stub into full article
 Crowdsourcing by
Dutch Wikipedia community
https://nl.wikip...
Wikipedia volunteers are expanding the
1.300 stubs…
gradually creating more and more full articles.
Door Sebastiaan ter Bu...
Before the project
The number of articles is
growing steadily…
… making Dutch people
wiser and happier!
http://www.formerdays.com/2011/05/dutch-liberation.html
Vielen Dank!
olaf.janssen@kb.nl - @ookgezellig
tinyurl.com/verzetskranten
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH, 30-08-2017, Berlin, Germany
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH, 30-08-2017, Berlin, Germany
Prochain SlideShare
Chargement dans…5
×

Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH, 30-08-2017, Berlin, Germany

186 vues

Publié le

During the second World War some 1.300 illegal newspapers were issued by the Dutch resistance.

Right after the war as many of these newspapers as possible were physically preserved by Dutch memory institutions. They were described in formal library catalogues, that were digitized and brought online in the ‘90s. In 2010 the national collection of underground newspapers – some 200.000 pages – was full-text digitized in Delpher, the national aggregator for historical full-texts.

Having created online metadata and full-texts for these publications, the third pillar ''context'' was still missing, making it hard for people to understand the historic background of the newspapers.

We are currently running a project to tackle this contextual problem. We started by extracting contextual entries from a hard-copy standard work on Dutch illegal press and combined these with data from the library catalogue and Delpher into a central LOD triple store.

We then created links between historically related newspapers and used Named Entity Recognition to find persons, organisations and places related to the newspapers. We further semantically enriched the data using DBPedia.

Next, using an article template to ensure uniformity and consistency, we generated 1.300 Wikipedia article stubs from the database.

Finally, we sought collaboration with the Dutch Wikipedia volunteer community to extend these stubs into full encyclopedic articles.

In this way we can give every newspaper its own Wikipedia article, making these WW2 materials much more visible to the Dutch public, over 80% of whom uses Wikipedia.

At the same time the triple store can serve as a source for alternative applications, like data visualizations. This will enable us to visualize connections and networks between underground newspapers, as they developed over time between 1940 and 1945.

---------------

Presentation during the DCH (Digital Cultural Heritage) conference, 30th Aug - 1st Sept 2017, Staatsbibliothek Berlin, Germany - http://dch2017.net

Publié dans : Formation
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - DCH, 30-08-2017, Berlin, Germany

  1. 1. Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia Olaf Janssen, National Library of the Netherlands Digital Cultural Heritage, Berlin, 30-08-2017 olaf.janssen@kb.nl - @ookgezellig - slideshare.net/OlafJanssenNL
  2. 2. http://www.4en5meiamsterdam.nl/attachment/47454
  3. 3. During WW2 the Dutch resistance issued many underground newspapers. In every shape & form… http://www.4en5meiamsterdam.nl/attachment/47454
  4. 4. http://resolver.kb.nl/resolve?urn=ddd:010436323 http://resolver.kb.nl/resolve?urn=ddd:010442948 http://resolver.kb.nl/resolve?urn=ddd:010447825 http://resolver.kb.nl/resolve?urn=ddd:010450508 From well-organized, ‘professional’ big titles… (o.a. Parool, Vrij Nederland, Trouw, de Waarheid)
  5. 5. …to very small, amateur, home-made, pamphlet-like issues
  6. 6. After the war 1.300 newspaper titles were collected & preserved at the NIOD … https://commons.wikimedia.org/wiki/File:Verzetskrant_in_archiefdozen_bij_het_NIOD.jpg – CC-BY-SA - OlafJanssen The national Institute for War, Holocaust and Genocide Studies in Amsterdam
  7. 7. http://opac-gonext.oclc.org:8180/DB=8/XMLPRS=Y/PPN?PPN=107123223 .. and were described in formal library catalogues (1.300 titles) Bibliographic metadata Underground students’ newspaper from The Hague
  8. 8. In 2010 these WW2 newspapers were digitized…..
  9. 9. www.delpher.nl/kranten …into full-texts in Delpher … (1.300 titles) The Dutch national aggregator for historic full-texts • Newspapers • Books • Magazines
  10. 10. In Delpher you can read and word-search these newspapers…
  11. 11. But say, I want to know more about this newspaper • What sort of illegal newspaper was it? • What is the history of this newspaper? • Who wrote it? • Where was this newspaper printed? • How was it distributed? • Were there any relations with other underground newspapers? • Etc…
  12. 12. But say, I want to know more about this newspaper • What sort of illegal newspaper was it? • What is the history of this newspaper? • Who wrote it? • Where was this newspaper printed? • How was it distributed? • Were there any relations with other underground newspapers or resistance groups? • Etc…
  13. 13. But say, I want to know more about this newspaper • What sort of illegal newspaper was it? • What is the history of this newspaper? • Who wrote it? • Where was this newspaper printed? • How was it distributed? • Were there any relations with other underground newspapers? • Etc… You can’t answer these questions from Delpher
  14. 14. Big drawback of Delpher: No contextual information about WW2 underground newspapers https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
  15. 15. Where would many people go to find contextual information about historic newspapers? Probably Wikipedia (via Google)
  16. 16. Where would many people go to find contextual information about historic newspapers? Probably Wikipedia (via Google)
  17. 17. http://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad) Where would many people go to find contextual information about historic newspapers? Probably Wikipedia (via Google)
  18. 18. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
  19. 19. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg
  20. 20. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg Information on Dutch underground newspapers was distributed across multiple, unconnected sources 1. Descriptions (metadata in library catalogue, 1.300 titles) 2. Content (full-text in Delpher, 1.300 titles) 3. Context (in Wikipedia…. at least... )
  21. 21. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg Information on Dutch underground newspapers was distributed across multiple, unconnected sources 1. Descriptions (metadata in library catalogue, 1.300 titles) 2. Content (full-text in Delpher, 1.300 titles) 3. Context (in Wikipedia…. at least... )
  22. 22. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg Information on Dutch underground newspapers was distributed across multiple, unconnected sources 1. Descriptions (metadata in library catalogue, 1.300 titles) 2. Content (full-text in Delpher, 1.300 titles) 3. Context (in Wikipedia…. at least... )
  23. 23. http://2.bp.blogspot.com/_BWzuYwiS6-I/TMgeRsFd3mI/AAAAAAAAElw/3cvgbZSPWcs/s1600/doctor+macro+judy+scared.jpg Information on Dutch underground newspapers was distributed across multiple, unconnected sources 1. Descriptions (metadata in library catalogue, 1.300 titles) 2. Content (full-text in Delpher, 1.300 titles) 3. Context (in Wikipedia…. at least... )
  24. 24. This Wikipedia article is a carefully chosen exception
  25. 25. 1. Very few illegal newspapers had their own WP articles 2. The inventory of these newspapers on WP:NL was far from complete <<< 1.300 titles
  26. 26. We tackled both problems!
  27. 27. Wikiproject “Systematically and uniformly describe all 1.300 Dutch underground newspapers from WW2 on Wikipedia” tinyurl.com/verzetskranten
  28. 28. Wikiproject “Systematically and uniformly describe all 1.300 Dutch underground newspapers from WW2 on Wikipedia” tinyurl.com/verzetskranten Reach big audiences
  29. 29. https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg We badly needed contextual information about the newspapers. Where did we get it? De Ondergrondse Pers 1940-1945 Lydia E. Winkel, H. de Vries , 1989 This paper book contains entries about all 1.300 illegal newspapers
  30. 30. Entry 199 – De Geus; (onder studenten)
  31. 31. Unique ID (within the book)
  32. 32. Place of publication Newspaper Place name
  33. 33. Context Raw material for Wikipedia article!
  34. 34. Person names Newspaper Persons
  35. 35. IDs of related students’ newspapers This newspaper Other newspapers
  36. 36. We OCRed this book into PDF + put it online under CC-BY-SA http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF)
  37. 37. We OCRed this book into PDF (CC-BY-SA) http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF) Available online (PDF, flat file) Open license (CC-BY-SA) Convert PDF into structured database. Link titles to places, persons, other titles Link titles to KB-catalogue (metadata) and Delpher (full-text) Link titles, persons and places to external sources
  38. 38. We OCRed this book into PDF (CC-BY-SA) http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF) Available online (PDF, flat file) Open license (CC-BY-SA) Convert PDF into structured database. Link titles to places, persons, other titles Link titles to KB-catalogue (metadata) and Delpher (full-text) Link titles, persons and places to external sources
  39. 39. We OCRed this book into PDF (CC-BY-SA) http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF) Available online (PDF, flat file) Open license (CC-BY-SA) --------------------------------------------------- Convert PDF into structured database Link: titles  places, persons, other titles Link titles to KB-catalogue (metadata) and Delpher (full-text) Link titles, persons and places to external sources
  40. 40. We OCRed this book into PDF (CC-BY-SA) http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF) Available online (PDF, flat file) Open license (CC-BY-SA) --------------------------------------------------- Convert PDF into structured database Link: titles  places, persons, other titles Link: titles  library catalogue (metadata) and Delpher (full-text) Link titles, persons and places to external sources
  41. 41. We OCRed this book into PDF (CC-BY-SA) http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945 (PDF) Available online (PDF, flat file) Open license (CC-BY-SA) --------------------------------------------------- Convert PDF into structured database. Link: titles  places, persons, other titles Link: titles  library catalogue (metadata) and Delpher (full-text) Link: titles, persons and places  external sources
  42. 42. Convert PDF into structured database. Link: titles  places, persons, other titles Link: titles  library catalogue (metadata) and Delpher (full-text) Link: titles, persons and places  external sources LOD & database expert Gerard Kuys
  43. 43. Convert PDF into structured database. Link: titles  places, persons, other titles Link: titles  library catalogue (metadata) and Delpher (full-text) Link: titles, persons and places  external sources
  44. 44. VIAF
  45. 45. Available online (PDF, flat file) Open license (CC-BY-SA) --------------------------------------------------- Convert PDF into structured database. Link: titles  places, persons, other titles Link: titles  library catalogue (metadata) and Delpher (full-text) Link: titles, persons and places external sources
  46. 46. Summer 2016 This LOD database is unique in the Netherlands. First time data about underground newspapers was systematically collected and linked online! https://www.pinterest.com/freethewronged/world-war-ii/
  47. 47. Wikiproject “Systematically and uniformly describe all 1.300 Dutch underground newspapers from WW2 on Wikipedia”
  48. 48. We have: LOD database Using an article template we generated 1.300 uniform and interlinked Wikipedia stubs https://c1.staticflickr.com/9/8281/7699231918_11a7356c38_b.jpg
  49. 49. https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
  50. 50. Grey = Wikipedia article stub Automatically generated from database using the article template
  51. 51. https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad) Non-grey = Wikipedia article stub Automatically generated from database using the article template
  52. 52. This bit was added manually to expand stub into full article  Crowdsourcing by Dutch Wikipedia community https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
  53. 53. Wikipedia volunteers are expanding the 1.300 stubs… gradually creating more and more full articles. Door Sebastiaan ter Burg [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
  54. 54. Before the project
  55. 55. The number of articles is growing steadily…
  56. 56. … making Dutch people wiser and happier! http://www.formerdays.com/2011/05/dutch-liberation.html
  57. 57. Vielen Dank! olaf.janssen@kb.nl - @ookgezellig tinyurl.com/verzetskranten

×