2. Linked Data
Data published on the Web in accordance with principles
designed to facilitate linkages between resources
The potential for linked data in libraries:
• Eliminates data silos - makes data accessible on the Web
and promotes sharing and re-use
• Promotes discovery of related resources through links
(to common people, subjects, etc.)
• Supports cooperative description
(‗open world assumption‘)
3. Key aspects of linked data
• Based on the core Web technologies (HTTP, URIs)
• Uses a simple data structure based on atomic statements
about resources (RDF)
• Can be interpreted by machines (semantic data)
• Focus on connecting resources, rather than simply
describing them (though it can do both)
4. HTTP (Hypertext Transfer Protocol)
The foundation of data communication for the Web
HTTP request
HTTP response
Client/User agent
(e.g. web browser)
Web
Server
5. URI (Uniform Resource Identifier)
Globally unique identifier for a resource on a computer
or a network.
HTTP URIs identify resources on the Web.
http://www.yourdomain.org/something
6. URI vs. URL
URLs (Uniform Resource Locators) are a subset of URIs
that, in addition to identifying a resource, provide a means of
locating it.
A URI does not necessarily point to a document;
a URL does.
A URI can identify a real-world object.
7. The Semantic Web
Proposed by Tim Berners-Lee in a 2001 article in Scientific
American
“The Semantic Web is not a separate Web but an extension of the current one, in
which information is given well-defined meaning, better enabling computers and
people to work in cooperation…
In the near future, these developments will usher in significant new functionality
as machines become much better able to process and „understand‟ the data that
they merely display at present.”
8. The Linked Data Principles
Tim Berners-Lee, 2006
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL).
4. Include links to other URIs so that they can discover
more things.
9. RDF (Resource Description Framework)
A framework for describing Web resources.
A Web resource is anything that can be retrieved or
identified on the Web via a URI.
RDF descriptions are based on simple
subject-predicate-object expressions called ―triples‖.
10. The RDF Triple
Subject - the resource being described
Predicate - a property of that resource
Object - the value of the property
Subject and predicate are defined using URIs.
Object can either be a URI or a literal value
(text, number, date, etc.)
subject
predicate
object
11. Here is some metadata…
Robert Moses Papers
CREATOR:
Moses, Robert, 1888-1981
EXTENT:
142 linear feet
REPOSITORY:
The New York Public Library.
Manuscripts and Archives Division.
12. Here are some triples
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/52866
196
http://archives.nypl.org/
mss/2071
‘142 linear feet’
http://archives.nypl.org/
mss/2071
http://data.nypl.org/org_
units/mss
http://purl.org/dc/ter
ms/creator
http://purl.org/dc/ter
ms/extent
http://purl.org/archiv
al/vocab/arch#held
By
Robert Moses Papers
Robert Moses Papers
Robert Moses Papers
creator Moses, Robert, 1888-1981
extent
repository NYPL Manuscripts & Archives
13. A set of related triples = a graph
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/52866
196
‘142 linear feet’
http://archives.nypl.org/
mss/2071
http://purl.org/dc/ter
ms/creator
http://purl.org/dc/ter
ms/extent
http://purl.org/archiv
al/vocab/arch#held
By
14. This is another graph
http://www.worldcat.org/
oclc/834874
http://viaf.org/viaf/44312
399
http://viaf.org/viaf/52866
196
http://purl.org/dc/ter
ms/creator
http://purl.org/dc/ter
ms/subject
15. Put the graphs together to make a new graph
http://archives.nypl.org/
mss/2071
http://viaf.org/viaf/5286
6196
‘142 linear feet’
http://archives.nypl.org/
mss/2071
http://purl.org/dc/term
s/creatorhttp://purl.org/dc/ter
ms/extent
http://purl.org/archival/vocab
/arch#heldBy
http://viaf.org/viaf/44312
399
http://purl.org/dc/ter
ms/creator
http://purl.org/dc/term
s/subject
Robert Moses Papers
The Power Broker
http://www.worldcat.org/
oclc/834874
16. RDF serialization formats
‗Serialization‘ = to record one or more RDF graphs in a
machine-readable file. There are 2 basic options:
RDF in a standalone text file:
• RDF XML
• N3 (Notation 3)
• Turtle (Terse RDF Triple Language)
• N-Triples
RDF embedded in HTML
• RDFa (RDF in attributes)
18. @prefix dcterms: <http://purl.org/dc/terms/>.
@prefix arch: <http://purl.org/archival/vocab/arch#>.
<http://archives.nypl.org/mss/2071>
dcterms:creator http://viaf.org/viaf/52866196;
dcterms:extent ‗142 linear feet‘;
arch:heldBy http://archives.nypl.org/mss/2071.
Basic triples in N3/Turtle
Statements about the same resource are grouped together.
Property URIs are shortened using prefixes (‗q-names‘).
20. RDFa (RDF in Attributes)
RDFa allows RDF data to be embedded within HTML.
Rendered HTML:
The Power Broker, by Robert Caro, is a biography of Robert Moses.
HTML code:
<div about=―http://www.worldcat.org/oclc/834874‖
prefix=―dcterms: http://purl.org/dc/terms/>
The Power Broker, by <span property=―dcterms:creator‖
resource=―http://viaf.org/viaf/44312399‖>Robert Caro</span>, is a biogrpahy of
<span property=―dcterms:subject‖
resource=―http://viaf.org/viaf/52866196‖>Robert Moses</span>
</div>
21. RDF Ontologies/vocabularies
• Define categories of things and the relationships that they
can have to each other
• Provide the semantics that allow data to be interpreted
by machines
• Establish rules of inference – what can be assumed to
be true based on what is asserted by a triple
22. RDFS (RDF Schema)
A basic vocabulary for ontology development.
RDFS defines RDF classes and properties.
Class: a category of resources; a resource in such a
category is said to be an instance of the class
Property: a relation between a subject and object in a triple
23. Classes and subClasses
The subClassOf property (used in defining a class) allows a
broad class to serve as the basis of a more specific class.
Defining a class (A) as a subClassOf another class (B)
means that any instance of A can be inferred to also be an
instance of B.
Class B
Class A
24. A simple Class/subClass example
Based on these class definitions:
‗Dog‘ is a Class
‗Poodle‘ is a Class
‗Poodle‘ is a subClassOf ‗Dog‘
And the statement:
Fido is a Poodle.
It can be inferred that:
Fido is a Dog.
25. RDFS Properties
The predicates in RDF triples are properties.
Properties themselves have two important properties:
domain: asserts that the subject of the triple is an instance
of specific class
range: asserts that the object of the triple is an instance of
specific class
26. OWL (Web Ontology Language)
Provides an extended set of properties used in
ontology/vocabulary definitions (used in conjunction with
RDFS)
• Equivalence/disjunction
• Advanced property definitions
• Restrictions and cardinality
owl:sameAs: A property that asserts that two resources are
the same (i.e. two URIs refer to the same thing)
27. SKOS
(Simple Knowledge Organization System)
Defines classes and properties to support the use of
thesauri, classification schemes, subject heading systems
and taxonomies in RDF
• Classes: skos:ConceptScheme, skos:Concept
• Properties: skos:broader, skos:narrower, skos:related,
skos:prefLabel, skos:altLabel
28. Library of Congress Linked Data Service
(id.loc.gov)
• Provides URIs for LC controlled vocabularies, thesauri,
language codes, classification schemes
• Most terms defined using SKOS + RDF representation
of MADS (where applicable)
• Complete vocabularies available as free downloads
29. FOAF (Friend of a Friend)
• Provides a vocabulary for describing people and their
relationships to each other and to the things they
make and do
• Originally intended for web-based social networks,
FOAF has gained wider acceptance in describing
historical figures and their relationships
• Classes: Agent, Person, Organization, Group
• Properties: knows, name, based_near
30. VIAF (Virtual International Authority File)
• Clusters names in authority files from numerous national
libraries and other agencies
• Named entities vs. just names
• OCLC is actively establishing links between VIAF and
Wikipedia, building an invaluable resource for
libraries/archives/museums to provide context for their
collections
31. Dublin Core Metadata Initiative
• Terms for general use in describing resources
• Properties relating to simple and qualified Dublin Core
elements
• Classes for general material types (Text, Image,
PhysicalObject, etc.)
• Classes for other resources referenced by DCMI
properties (FileFormat, RightsStatement,
ProvenanceStatement, etc.)
32. Schema.org
• Cooperative project between Bing, Google and Yahoo to
provide mechanism to describe web content via
standardized vocabularies
• Structured data is included in HTML content via microdata
(similar to RDFa)
• Basis of Google Knowledge Graph
• OCLC now provides Schema.org linked data for all
records in WorldCat
33. DbPedia
• Crowd-sourced community effort to extract structured
information from Wikipedia
• Enables sophisticated queries against Wikipedia
• Makes Wikipedia data freely available for re-use
34. Other useful/notable linked data sources
Vocabularies/ontologies
• Bibliographic ontology
• Archival ontology
• Relationship ontology
Data sources
• GeoNames, Europeana, MusicBrainz, data.gov,
nytimes.com, BBC, Project Gutenberg…
35. The obligatory linked data cloud slide
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
36. Technical things to know a little about
• Triplestore – a database for storing RDF data
• SPARQL (SPARQL Protocol and RDF Query Language)
The primary query language for RDF data (analogous to
SQL for relational databases)
• SPARQL endpoint – Web service that provides direct
access to RDF data stores via SPARQL queries
• HTTP content negotiation – process for delivering
content (data) in different formats (e.g. RDF vs. HTML)
based on HTTP request
37. Linked data attribution
A growing concern in the linked data community is the need
to include attribution with data in order to determine whether
or not it can/should be trusted.
• RDF reification – allows source attribution to be associated with an
RDF triple
• Named graphs – Extension of RDF that allows attribution and other
metadata to be associated with RDF descriptions
• Quad stores – Similar to triplestores but with an additional element
that connects the triple with its source
38. Linked Open Data
Linked data that is freely usable, reusable, and
redistributable — subject, at most, to attribution and ‗share
alike‘ requirements
39. Open data licensing
A nonprofit organization that enables the sharing and use of
creativity and knowledge through free legal tools.
CC provides alternatives to ―all rights reserved‖ copyright.
40. Creative Commons LicensesOPENDATA(:
Attribution (CC BY)
Allows distribution and reuse in any way as long as you get credit
Attribution-ShareAlike (CC BY-SA)
Allows distribution and reuse in any way as long as you get credit and
derivative works are released under the same license
Attribution-NoDerivs (CC BY-ND)
Requires that the original is used unchanged and in whole, with credit to you
Attribution-NonCommercial (CC BY-ND)
Allows distribution and reuse in any way, for non-commercial purposes only, as long as
you get credit
Attribution-NonCommercial-ShareAlike (CC BY-NC-SA)
Requires that the original is used unchanged and in whole, with credit to you, provided
that derivative works are released under the same license
Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)
Only permits use as-is, for non commercial purposes, and with credit to you – the most
restrictive CC license available
NOTOPENDATA):
41. CC0 (‘CC Zero’)
• Allows creators to waive all rights to work and to place it
as completely as possible into the public domain.
• Designed to make it as clear as is legally possible that any
use of your content is allowed
• Quickly becoming the preferred license for open data
42. LC Bibliographic Framework Initiative
• Developing a new bibliographic framework (to replace
MARC) based on linked data principles
• First draft of the Bibliographic Framework (BIBFRAME)
model published in November 2012