1. Linked Open Data for
Digital Humanities
What is Linked Open Data and
why is it relevant for you ?
Christophe Guéret (@cgueret)
2. Open Data
“A piece of data or content is open if anyone
is free to use, reuse, and redistribute it —
subject only, at most, to the requirement to
attribute and/or share-alike.”
http://opendefinition.org/
3. Linked Data
"a term used to describe a recommended
best practice for exposing, sharing, and
connecting pieces of data, information, and
knowledge on the Semantic Web using URIs
and RDF."
http://linkeddata.org/
4. Linked Open Data
● Linked Open Data = Open Data + Linked
Data
● Interconnected data sets that are on the
Web and free to use
● 5-star scheme http://5stardata.info/
5. Why does it matter for DH ?
● Digital Humanities use a lot of data and
study relations between things
● Data acquisition & curation represents a
LOT of efforts for data consumers
● Linked Open Data is a good way to
○ Facilitate your own work (as a data consumer)
○ Facilitate other's work (as a data publisher)
6. Data found on the Web
● You get the following table as a CSV file
Kennis Stad
Christophe Amsterdam
David Parijs
● And that Excel table from somewhere else
Ville Pays
Paris France
Amsterdam Pays-Bas
7. And you want to integrate it
Kennis Stad Ville Pays
Christophe Amsterdam + Paris France =?
David Parijs Amsterdam Pays-Bas
● Data integration issues
○ Kennis, Stad, Ville, Pays ?
○ Parijs = Paris ?
○ Amsterdam = Amsterdam ?
● Lot of work for the (uninformed) consumer !
8. Linked Data approach
● Assign unique identifiers (URIs) to concepts
and things
● Create a "triple": connect the identifiers with
labelled, directed edges
dbo:country
dbpedia:Amsterdam dbpedia:Netherlands
9. Why does it solves the issue?
● Shift some of the data integration load on the
provider side
○ Clarify the semantics of the data
○ Refer to identifiers rather than names
● There is only one "dbpedia:Amsterdam" at
http://dbpedia.org/resource/Amsterdam
● Labels used for the edges are published by
an external authority
12. From triples to the Web of Data
● Every triple is a bit of factual information
● Because nodes are re-used across triples,
the union of all the triples is a graph
● The "Web of Data" is a pre-integrated,
semantically clear, data set ready to be
used!
14. Let's make a social network !
● The network
○ A node per European country
○ An edge means a shared official language
○ Label the edges with the languages
○ Label the nodes with the country names
● Data source
○ DBpedia SPARQL http://dbpedia.org/sparql
● Visualisation tool
○ Gephi https://gephi.org/
15. SPARQL ?
● Query language for Linked Open Data
● Describe part of the graph and use variables
dbo:country
dbpedia:Amsterdam ?Country
Suggested
book to read
17. Making the network
● Get the query from
○ https://gist.github.com/cgueret/5098706
● Copy & paste in to
○ http://dbpedia.org/sparql
● Change the result format to "CSV"
● Press "Run Query" and save the result
● Open Gephi
● Start a new project
● Import the CSV file in the "Data Laboratory"
20. Last words
● Look for data sources published as Linked
Open Data (RDF), this can save you time
● Consider publishing your own data as Linked
Open Data
● There is much more to say...
○ Using SPARQL within R (very easily)
■ http://linkedscience.org/tools/sparql-package-for-r/
○ Reasoning capabilities of triple stores
○ Creating and extending vocabularies