Presented at Linked Open Data: current practice in libraries and archives (Cataloguing & Indexing Group in Scotlland 3rd Linked Open Data Conference), Edinburgh, 18 Nov 2013
2. vision
txt2rdf
1
The semantic web vision
2
Extracting structured knowledge from free text
3
grounding
Respect for authority, or, Why we need ontologies
2
3. vision
txt2rdf
grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machine, leaving
humans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
4. vision
txt2rdf
grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machine, leaving
humans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
5. vision
txt2rdf
grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machine, leaving
humans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
6. vision
txt2rdf
grounding
The semantic web vision
W3C RDF Concepts, 2002 draft
“RDF ... allows anyone to say anything about anything.”
Tim Berners-Lee, 2006
“The day-to-day mechanisms of trade, bureaucracy and our daily
lives will be handled by machines talking to machine, leaving
humans to provide the inspiration and intuition.”
Tim Berners-Lee, 2009
“The web as I envisaged it, we have not seen it yet.”
3
7.
8. vision
txt2rdf
grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,
wet hole, filled with the ends of worms and an oozy smell, nor yet
a dry, bare, sandy hole with nothing in it to sit down on or to eat:
it was a hobbit-hole, and that means comfort.”
5
9. vision
txt2rdf
grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,
wet hole, filled with the ends of worms and an oozy smell, nor yet
a dry, bare, sandy hole with nothing in it to sit down on or to eat:
it was a hobbit-hole, and that means comfort.”
hobbit
lives in
hole
located in
the ground
5
10. vision
txt2rdf
grounding
Simple declarative sentences
“In a hole in the ground there lived a hobbit. Not a nasty, dirty,
wet hole, filled with the ends of worms and an oozy smell, nor yet
a dry, bare, sandy hole with nothing in it to sit down on or to eat:
it was a hobbit-hole, and that means comfort.”
hobbit
lives in
hole
located in
the ground
does not have
nastiness
has type
hobbit hole
has characteristic
comfort
5
21. vision
txt2rdf
1
The semantic web vision
2
Extracting structured knowledge from free text
3
grounding
Respect for authority, or, Why we need ontologies
8
23. vision
txt2rdf
grounding
Natural Language Processing pipeline
Text documents
sfsjksjwjvssjkljljs sd’lajoen s
Pre−processing
tokenise
jjs kjdlk lksjlkj sks oihhg sk
jjlkjlj jljbjl skj ekw
generate
triples
Graph
of triples
sentence
and para
split
remove
unwanted
relations
RDF
translation
Named Entity Recognition
POS tag
multi−word
tokens and
features
trained NER
model
list of NEs
and
classes
attach
siteids
trained RE
model
set of NE
pairs and
features
list of
relations
and classes
Relation Extraction
9
24. vision
txt2rdf
grounding
Named entities and relations
site 20
Evidence of a quartz knapping site was found within the confines of the stone
circle, and in conjunction with several structures within the inner ring,
strongly suggests a domestic site.
Besides the quartz implements and corresponding waste, several other artifacts of local
origin occurred including a split pebble axe of greenstone with Shetland Early
Bronze Age affinities. B Beveridge, 1972.
Field survey and excavation, as a response to continual wind and marine
erosion, was carried out at the Sands of Breckon between
1982 and 1983.
HP50NW 11.00 was recorded as a stone settings surrounded by
occupational debris (Site 22). Excavation revealed midden deposits of an
early Iron Age date and a surface scatter of artefacts of mixed dates. The
stone settings were tentatively interpreted as the basal stones of long
cists.
Historic Scotland Archive Project (SW) 2002.
10
28. vision
txt2rdf
1
The semantic web vision
2
Extracting structured knowledge from free text
3
grounding
Respect for authority, or, Why we need ontologies
13
30. vision
txt2rdf
grounding
Let’s remind ourselves what’s the point of Linked Data
archaeological site archive
museum database
siteid:
sitename:
47919
Cairnpapple
find spot: Cairnpapple
classification:
Cairn, henge
This stone flake from the cutting edge of a
ground stone axehead was found at Cairnpapple
in West Lothian. The stone is from...
site number:
NS97SE 16
objectid:
X.EP 167
A complex site on the summit of Cairnpapple Hill
excavated by Piggot in 1947...
:Objectid#x.ep+167
Classn/Sitetype#cairn%20+henge
:hasClassn
:hasFindSpot
:hasClassn
:hasId
:Siteid#site47919
:hasLocation
:Classn/Objtype#axe+flake
Id#ns97se+16
:hasEvent
:Loc/Sitename#cairnpapple
:Event#excavated47919w10
:hasLocation
:hasLocation
:Loc/Place#west+lothian
:hasAgent
:Agent/Person#piggot
:hasPeriod
:Time/Date#1947
:Loc/Place#cairnpapple+hill
13
31. vision
txt2rdf
grounding
But linking Linked Data is actually pretty hard
archaeological site archive
museum database
siteid:
sitename:
47919
Cairnpapple
find spot: Cairnpapple
classification:
Cairn, henge
This stone flake from the cutting edge of a
ground stone axehead was found at Cairnpapple
in West Lothian. The stone is from...
site number:
NS97SE 16
objectid:
X.EP 167
A complex site on the summit of Cairnpapple Hill
excavated by Piggot in 1947...
:Objectid#x.ep+167
Classn/Sitetype#cairn%20+henge
:hasClassn
:hasFindSpot
:hasClassn
:hasId
:Siteid#site47919
:hasLocation
:Classn/Objtype#axe+flake
Id#ns97se+16
:hasEvent
:Loc/Sitename#cairnpapple
:Event#excavated47919w10
:hasLocation
:hasLocation
:Loc/Place#west+lothian
:hasAgent
:Agent/Person#piggot
:hasPeriod
:Time/Date#1947
:Loc/Place#cairnpapple+hill
Direct link means spotting identical node in separate graph
How? String matching? Clues from context?
14
35. vision
txt2rdf
grounding
Grounding site20 against Monument Thesaurus
sitetype:religious+ritual+and+funerary
skos:broader
sitetype:standing+stone
"An arrangement of two
or more standing stones"
sitetype:stone+circle
skos:scopeNote
event:excavation
"stone setting"
rdf:type
sitetype:stone+row
skos:related
rdfs:label
sitetype:stone+setting
rdfs:subClassOf
rdf:type
sitetype:
date:1982
sitetype:stone+settings20w179
:hasPeriod
:hasClassn
:hasEvent
event:excavation20w158
siteid:site20
:hasLocation
:hasLocation
sitename:sands+of+breckon
:hasLocation
address:hp50nw+11.01+hp+5304+0519
address:breckon
16
36. vision
txt2rdf
grounding
Grounding site20 against Monument Thesaurus
sitetype:religious+ritual+and+funerary
skos:broader
sitetype:standing+stone
"An arrangement of two
or more standing stones"
sitetype:stone+circle
skos:scopeNote
event:excavation
"stone setting"
sitetype:stone+row
skos:related
rdf:type
rdfs:label
sitetype:stone+setting
rdfs:subClassOf
rdf:type
sitetype:
date:1982
sitetype:stone+settings20w179
:hasPeriod
:hasClassn
:hasEvent
event:excavation20w158
siteid:site20
:hasLocation
:hasLocation
sitename:sands+of+breckon
:hasLocation
address:hp50nw+11.01+hp+5304+0519
address:breckon
16
37. vision
txt2rdf
grounding
Grounding against various authorities/ontologies
Placename authorities: Geonames, OS gazetteer, Pleiades
Period: EH draft ontology
Monument classifications: Seneschal project
Bibliographic: LCSH, FRBR
...hundreds of LOD datasets in the cloud
Informatics projects
Edina “Unlock” service – spatial and temporal grounding
GAP projects – grounding against maps of the ancient world
17
38. vision
txt2rdf
grounding
Grounding against various authorities/ontologies
Placename authorities: Geonames, OS gazetteer, Pleiades
Period: EH draft ontology
Monument classifications: Seneschal project
Bibliographic: LCSH, FRBR
...hundreds of LOD datasets in the cloud
Informatics projects
Edina “Unlock” service – spatial and temporal grounding
GAP projects – grounding against maps of the ancient world
17