2. 2
Objective
To
introduce the concept of linked data
without too much technical stuff!
(because every conference you attend
these days mentions linked data or linked
open data or linked library data or linked
open library data!)
(or you will see tweets with #lod #lodlam)
Introducing Linked Data
3. 3
Definition of Linked Data
"describes a method of publishing
structured data so that it can be interlinked
and become more useful. It builds upon
standard Web technologies such as
HTTP, RDF and URIs, but rather than using
them to serve web pages for human
readers, it extends them to share
information in a way that can be read
automatically by computers.” (emphasis
added)
Introducing Linked Data
From Wikipedia linked data page
4. 4
Human-readable vs. machineactionable*
Look
at this Wikipedia page and tell me
what you know about Margaret Atwood
from looking at the page
*rather than machine-readable, library
consultant Karen Coyle often uses the term
actionable data, which I find easier to
understand. See her Library Technology
Report on the semantic web.
Introducing Linked Data
6. 6
A linked data web
Alison’s
guide to
Margaret
Atwood
person
Is subject of
Margaret
Atwood
Is type of
Has homepage
Undefined
URL link
http://margaret
atwood.ca/
Inspired by a semantic web
slide by Eric Miller
Introducing Linked Data
8. 8
Identify your data
This
resource is a person
Name: “Margaret Atwood”
Birth date: 19391118
Place of birth: Ottawa, Ontario
Occupation: novelist
Occupation: poet
Author of: “The Handmaid’s Tale”
Introducing Linked Data
9. 9
Publish your data on the web
The
Virtual International Authority File
(VIAF) combines authorities from many
national libraries and has made the
records available on the web
With a permanent identifier
In multiple web-friendly formats
Go to Record for Margaret Atwood in VIAF
Introducing Linked Data
10. 10
Make connections
Build
connections between your data
records and other datasets
Many datasets link to DbPedia which is
the data behind Wikipedia
Go to DbPedia page for Margaret Atwood
and find the VIAF identifier
Introducing Linked Data
11. 11
The famous linked data cloud
The
linked data cloud shows the
connections between datasets on the
web
Excerpt from: “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
http://lod-cloud.net/”
Introducing Linked Data
12. 12
Connect your data
This
resource is a person
Use class of persons from the Friend of a
Friend (FOAF)ontology
Place
of birth: Ottawa, Ontario
Could link to Geonames
Occupation:
Could link to LCSH term
Author
novelist
of: “The Handmaid’s Tale”
Could link to The Open Library page
Introducing Linked Data
13. 13
Library Use Cases*
Enrich
our bibliographic data
Enrich our authority data
Align subject vocabularies
Share our unique collections and
information
*for our next linked data session!
Introducing Linked Data
14. 14
Some technical stuff*
Ideally
everything has a uniform resource
identifier (URI) e.g. http://viaf.org/viaf/109322990
Data is modeled using Resource
Description Framework (RDF)
Use a common format such as Extensible
Markup Language (XML)
*for our next linked data session!
Introducing Linked Data
15. 15
Some resources
Colye, Karen. Understanding the semantic web: bibliographic data
and metadata. Chicago: American Library Association, 2010
(Library Technology reports ; v. 46, no. 1) access at
http://www.metapress.com.proxy.lib.uwaterloo.ca/content/g212v1
783607/ (subscription required)
Harper, Corey. Library linked data: tuning library metadata for the
semantic web. An ALCTS webcast, March 16. 2011. access at
http://www.ala.org/alcts/confevents/upcoming/webinar/cat/0316
11 (open access)
Berners-Lee, Tim. The next web. A TED talk, February 2009. access at
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
(open access)
Heath, Tom and Christian Bizer (2011) Linked Data: Evolving the
Web into a Global Data Space. 1st ed. Morgan & Claypool, 2011.
(Synthesis Lectures on the Semantic Web: Theory and
Technology, 1:1) http://linkeddatabook.com/editions/1.0/ (open
access)
Introducing Linked Data
16. 16
Acknowledgments
Thank
you to library consultant Karen
Coyle who explains these concepts in
such a straight-forward way
Thank you to Corey Harper at NYU and MJ
Suhonos who are very patient and
encouraging; they have answered many
of my LOD questions and reviewed
presentations for me
Introducing Linked Data
I first heard about linked data when I started teaching Classification & Indexing at FIMS; the previous instructor had a section on the Semantic Web and I thought “What the heck is that???” Now it has become fairly common to see posts, webinars, conference sessions, workshops, courses and so on about this thing called linked data. Today I want to introduce this topic too you but try to stick to concepts rather than the technical side of how it is accomplished.
Here is a definition of linked data from Wikipedia, because many us turn to Wikipedia for a quick introduction to terms we don’t know! There are a few things to point out here:Publishing data so that is available on the webThe concept of structured dataThe difference between how humans consume information on the web and how computers consume information on the webWe’re going to talk about these although not necessarily in this order. Don’t worry about HTTP, RDF and URI at the moment if you don’t know those terms. What is important is the use of standards.
To start we’re going to take a quick look at this Wikipedia page for Margaret Atwood. What can you tell me about Margaret Atwood from viewing this page?Some of the things we know:Margaret Atwood is a person – how do we know this? She looks like a person, she has a birth date and place, she is an author– because the type of information given is the type that we associate with being a personWhere and when she was born, what she does, what she has written, what her father did, that she was a voracious reader, when she started writing and so onWe can easily digest these sentences and make meaning from themWe also assume, for example, then when we click on “Arthur C. Clarke Award” it will take us to a page about that awardThis page is pretty easy to understand and use by a person who reads English.
In the classic web a machine would have a hard time acting on the information on a webpage in any meaningful way. It can follow links from one page to another but it doesn’t have any information about how those pages relate to each other and what information those pages have about a resource.You would simply click on the link text or URL and the machine would take you to that new location. The two resources are simply linked by a miscellaneous, meaningless hyperlink. You might assume that clicking on a particular link is taking you to Margaret Atwood’s homepage but the computer is simply going from one resource location to another.
In the linked data web we give the machines more information about the relationships between things:In some cases there is no more information, I might just have a generic hyperlink from my page about MA to her home pageBut Wikipedia, using its DbPedia service, might define that Margaret Atwood is a person, that this URL they are linking to is a homepage, and that this other page they are linking to has Margaret Atwood as a subjectI was talking to Sandra last night and her interpretation of the phrase linked data was a link from a journal article to a data set for example. But let's use this as our example. Traditionally you would have page for a journal article and then for those who know html you would put a href tab with the URL in it for the data set and you would add a nice friendly human readable description such as "link to associated dataset". This is great for humans. The machine just does what it is told and moves from one link to the next.Now imagine you have some statements about this journal article. This article is written by Sandra. This article has title Library staff training preferences for learning with data. This article has dataset training survey results. and these statements can be used together to allow the computer to follow its nose. So if this articles is written by Sandra and this article has this dataset, then this dataset is also associated with Sandra. What other datasets are associate with Sandra? and who is Sandra? if isauthor then probably a person or institution, does this person or institution have an identifier? maybe the journal site has even given a same as relationship between their ID for Sandra and an ID for Sandra in ORCID. Now the machine can send a query to ORCID to find out more about Sandra and perhaps pull in associated articles. This collection of short statements start to build up and allow us to explore from one statement to another.
It is much easier to do this if we have structured data! The way I like to show structured data is by using something that we are all familiar with to some degree – spreadsheets!If you type something like 4-30-2011 into Excel it recognizes the format and automatically changes it to a date. This is because you have used a standard, well-defined format. You can go one step further and format a cell or row of cells yourself and say to Excel these pieces of data are all dates. Excel then knows the rules for what it can do with dates. So you have one piece of information in a cell and you have told the machine the type of information it is and because of that type the machine knows what kinds of things can be done with it, for example how to sort, how to calculate number of days etc. It knows to treat dates differently than currency and differently again than textual data. Textual data itself can be totally unstructured or it might be a value in a field for example the title field of a book record in a database.
So to be structured we should identify our data and when applicable use standard formats.
Then we should make that data available on the web so that it can be used with other data. For example, other resources about Margaret Atwood could retrieve from the Virtual International Authority File (VIAF) variant representations of her name and also a list of selected titles she has written. This is because VIAF has established a permanent identifier for the resource Margaret Atwood and has provided the associated data in multiple web-friendly formats.
It is especially useful if you can build connections between your data and other datasets. A useful dataset to link to is DbPedia, the dataset for Wikipedia, because many others also link here; you will have linked your data to a much larger universe with one mapping!If we look at the DbPedia page for MA we can see that it includes the VIAF identifier.
How do I know that we should link to DbPedia to link to other things? By looking at the linked data cloud! You can see the large number of connections coming in and out of DbPedia in this visualization. The library related datasets are over of the right.
So we don’t simply expose our data to the web, we make connections in various ways that will lead to other connections!
If you are interested in what this means for libraries I am happy to do a follow-up session showing library use cases. In brief here are some examples. I am currently a member of the ELUNA/IGeLU Linked Open Data Special Interest and Working Group that is collecting use cases for conversations with Ex Libris.
Additionally in a follow-up session I could talk about some of the technical details. For those who are interested I am doing a linked data session for IST on Fri. Dec. 13th at 9 a.m. in MC2009