3. www.adaptcentre.ieKnowledge Graphs - Example
3Image source: https://aws.amazon.com/neptune/
When did this occur?
What is the time span?
(Valid time)
4. www.adaptcentre.ie
4
When did this occur?
What is the time span?
(Valid time)
What’s the confidence
of this fact?
(Certainty)
Knowledge Graphs - Example
5. www.adaptcentre.ie
When did this occur?
What is the time span?
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
What’s the confidence
of this fact?
(Certainty)
5
Knowledge Graphs - Example
6. www.adaptcentre.ie
When did this occur?
What is the time span?
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
What’s the confidence
of this fact?
(Certainty)
6
Knowledge Graphs - Example
Where does this data
come from?
(Provenance)
7. www.adaptcentre.ie
● Temporal aspects of facts are usually not reflected in KGs
(When are specific statements - triples - valid?)
● Facts extracted from heterogeneous data sources hold different degrees of
certainty, depending on the source or the extraction/generation process
● Missing efficient solutions for managing the dynamics (the evolution) of KGs
(When were specific statements added/updated?)
● Need for data provenance: what’s the origin of the data?
Popular Use Cases for Contextual Metadata
7
8. www.adaptcentre.ieData Provenance with PROV-O
Provenance (W3C definition¹):
“Provenance of a resource is a record that describes entities and processes involved in producing and delivering or
otherwise influencing that resource.
Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility.
Provenance assertions are a form of contextual metadata and can themselves become important records with their own
provenance.”
PROV-O:
W3C ontology (OWL) based on
the core PROV data model
http://www.w3.org/TR/prov-o/
8¹ https://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
11. www.adaptcentre.ieExample of Statement-Level Metadata
11
Subject Predicate Object Starts Ends
Cristiano Ronaldo team Real Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo team Juventus 11 July 2018
Cristiano Ronaldo Real Madrid
team
How to represent this
in a graph?
?
the problem of n-ary (not binary) relations...
12. www.adaptcentre.ieRDF graphs vs. Property graphs
12
RDF Graphs
● Formally defined data model
● Various well-defined serialization
formats
● Well-defined query language with a
formal semantics
● Natural support for globally unique
identifiers
● Semantics of data can be made
explicit in the data itself
● W3C recommendations (standards!)
● High usage complexity
Labeled-Property Graphs (e.g. neo4j )
● Easy to manage statement-level
metadata
● Efficient graph traversals
● Fast and scalable implementations
● No open standards defined
● Different proprietary implementations
and query languages
● Good adoption in enterprise
13. www.adaptcentre.ieRDF graphs vs. Property graphs
13
RDF Graphs
Vertices
Every statement produces two vertices in the graph.
Some are uniquely identified by URIs: Resources
Some are property values: e.g. Literals
Edges
Every statement produces an edge.
Uniquely identified by URIs
Vertices or Edges have NO internal structure
Labeled-Property Graphs (e.g. neo4j )
Vertices
Unique Id + set of key-value pairs
Edges
Unique Id + set of key-value pairs
Vertices and Edges have internal structure
14. www.adaptcentre.ieRDF graphs vs. Property graphs
14
SPARQL
SELECT ?who
WHERE
{
?who :likes ?a .
?a rdf:type :Person .
?a :name ?aName .
FILTER regex(?aName,’Ann’)
}
Cypher (neo4j)
MATCH
(who)-[:LIKES]->(a:Person)
WHERE
a.name CONTAINS ‘Ann’
RETURN who
Query: Who likes a person named “Ann”?
15. www.adaptcentre.ieStatement-Level Metadata with Property Graphs
15
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo Real Madrid
team {
starts : 2009-07-01
ends : 2018-07-10 }
16. www.adaptcentre.ieModelling (1) - RDF Reification
16
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
Real_Madrid
Stmt1 Statement
2009-07-01
2018-07-10
subject object
predicate
type
starts
ends
17. www.adaptcentre.ieModelling (1) - RDF Reification
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Pros:
1. Easy to understand
Cons:
1. Not Scalable => Takes 4N to represent
a statement
2. No formal semantics defined
3. Discouraged in LOD!
4N
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
18. www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
18
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team#1
Real_Madrid
team
2009-07-01
2018-07-10
singletonPropertyOf
starts
ends
Subject Predicate Object
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
19. www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
19
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
Pros:
1. More scalable => only 1 extra triple
Cons:
1. Less intuitive
2. Large number of unique predicates
3. Requires verbose constructs in queries
20. www.adaptcentre.ieModelling (3) - RDF* and SPARQL*
20
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
RDF extension for nested triples:
<< :Cristiano_Ronaldo :team :Real_Madrid >>
:starts “2009-07-01” ;
:ends “2018-07-10”.
SPARQL extension with nested triple patterns:
SELECT ?player WHERE {
<< ?player :team :Real_Madrid >> :starts ?date .
FILTER (?date >= “2009-07-01”) }
21. www.adaptcentre.ie
21
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
1. Purely syntactic “sugar” on top of standard RDF and SPARQL
a. Can be parsed directly into standard RDF and SPARQL
b. Can be implemented easily by a small wrapper on top of any
existing RDF store (DBMS)
2. A logical model in its own right, with the possibility of a
dedicated physical schema
a. Extension of the RDF data model and of SPARQL to capture the notion of
nested triples
b. Supported by some of the most popular triplestores (e.g. Jena, Blazegraph)
Modelling (3) - RDF* and SPARQL*
O Hartig: “Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF.” In Proc. of the 11th Alberto Mendelzon
International Workshop on Foundations of Data Management (AMW), 2017.
22. www.adaptcentre.ie
22
Recent effort and solution, receiving wider attention and support.
Since 2020, part of the W3C “RDF dev community group”: https://w3c.github.io/rdf-star/
Modelling (3) - RDF* and SPARQL*
Now you can also test it live on Yago (https://yago-knowledge.org)
Try --> https://bit.ly/2V4ARXL
23. www.adaptcentre.ie
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
Modelling (4) - Named Graphs (Quads)
23
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Cristiano_Ronaldo
team
Real_Madrid
graph_1
2009-07-01
2018-07-10
starts
ends
graph_X
24. www.adaptcentre.ie
Pros:
1. Intuitive - creates N named graphs for N
sources
2. Attach metadata for a set of triples
3. RDF and SPARQL standards
https://www.w3.org/TR/sparql11-query/#specifyingDataset
Cons:
1. Restricts usage of named graphs to
provenance only
2. Requires verbose constructs in queries
Modelling (4) - Named Graphs (Quads)
24
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
A possible specification is N-Quads that extends N-Triples
with an optional context value at the fourth position
http://www.w3.org/TR/n-quads/ (W3C Recommendation)
25. www.adaptcentre.ieData Provenance with PROV-O - Example
25
prov:wasAttributedTo
:Fabrizio
Expressing statements about statements using Named Graphs and PROV-O
:graphName
28. www.adaptcentre.ieModelling (5) - Qualifiers in Wikidata
28
wd:Cristiano_Ronaldo
wdt:member_of_sports
_team wd:Real_Madrid
wds:Statement
2009-07-01
2018-07-10
p:member_of_sports_team ps:member_of_sports_team
pq:start_time
pq:end_time
The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The prefix pq: within the statement node retrieves the qualifier information.
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
(see: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks)
31. www.adaptcentre.ieSummary - Statement Level Metadata in RDF
1) Standard Reification
2) Singleton Property
3) RDF* / SPARQL*
4) Named Graphs (Quads)
5) Wikidata Qualifiers
31
32. www.adaptcentre.ie
Research in our group…
How can we effectively represent and manage temporal dynamics
and uncertainty of facts in knowledge graphs?
Current activities:
● Model and characterise facts in KGs according to temporal and uncertainty aspects
● Develop solutions for real-time processing, update and propagation of changes in
KGs
● Evaluate the developed solutions, applying them to different use cases
32
33. www.adaptcentre.ie
Research in our group…
- RDF* Observatory: Benchmarking RDF*/SPARQL* engines
https://github.com/dgraux/RDFStarObservatory
- A real-time dashboard for Wikidata edits
- Summarising and verbalising the evolution of KGs with Formal
Concept Analysis
- A scalable and efficient storage layer for temporal KGs
33
34. www.adaptcentre.ie
Some Industrial Use-Cases
1) Finance (temporal aspects)
Data about companies, their shares & market is complex, available and very time-dependent.
→ See “Thomson Reuters” and “Bloomberg” KGs
2) Law / Court Cases (uncertainty)
Legal search and Q&A systems on large corpora of court cases need the uncertainty dimension for
their different information extraction systems
→ See “Wolters Kluwer’s KG” and Google’s “Knowledge Vault”
3) News & Social Media (dynamics)
Very time-dependent & uncertain data which needs an efficient management solution for its dynamics
→ See “GDELT” Global Knowledge Graph project
34