1) The document compares different methods for representing statement-level metadata in RDF, including RDF reification, singleton properties, and RDF*.
2) It benchmarks the storage size and query execution time of representing biomedical data using each method in the Stardog triplestore.
3) The results show that RDF* requires fewer triples but the database size is larger, and it outperforms the other methods for complex queries.
2. Statement-Level Metadata in RDF
2
Subject Predicate Object Starts Ends
:Cristiano_Ronaldo :team :Real_Madrid 1 July 2009 10 July 2018
:Cristiano_Ronaldo :team :Juventus 11 July 2018 -
How to represent this
in RDF?
Cristiano Ronaldo Real Madrid
team
?
the problem of n-ary (not binary) relations...
3. Modelling (1) - RDF Reification (W3C standard)
3
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
Real_Madrid
Stmt1 Statement
2009-07-01
2018-07-10
subject object
predicate
type
starts
ends
4. Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
4
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano_Ronaldo
team#01
Real_Madrid
team
2009-07-01
2018-07-10
singletonPropertyOf
starts
ends
Subject Predicate Object
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
5. Modelling (3) - RDF* and SPARQL* (Hartig et al.)
5
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
RDF extension for nested triples:
<< :Cristiano_Ronaldo :team :Real_Madrid >>
:starts “2009-07-01” ;
:ends “2018-07-10”.
SPARQL extension with nested triple patterns:
SELECT ?player WHERE {
<< ?player :team :Real_Madrid >> :starts ?date .
FILTER (?date >= “2009-07-01”) }
https://w3c.github.io/rdf-star/
6. 6
Subject Predicate Object Starts Ends
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
1. Purely syntactic “sugar” on top of standard RDF and SPARQL
a. Can be parsed directly into standard RDF and SPARQL
b. Can be implemented easily by a small wrapper on top of any RDF store
2. A logical model in its own right, with the possibility of a dedicated physical schema
a. Extension of the RDF data model and of SPARQL to capture the notion of nested triples
b. Supported by some of the most popular triplestores (e.g. Jena, Blazegraph, Stardog...)
Modelling (3) - RDF* and SPARQL*
O Hartig: “Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF.”
The 11th Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2017.
7. 7
Recent effort and accessible solution, receiving wider attention and support.
Since Nov 2020, part of the W3C RDF community group: https://w3c.github.io/rdf-star/
Modelling (3) - RDF* and SPARQL*
- Growing adoption, e.g. Yago 4 (https://yago-knowledge.org)
- Support started by the most popular triplestores
8. 1- RDF Reification
2- Singleton Property
3- RDF*
4- Named graphs
5- Quads
6- Wikidata qualifiers
Many options…
8
Fully flexible / expressive
Limited (as they only add one level of information)
Ad-hoc solution
9. REF - the RDF REiFication benchmark
<https://doi.org/10.5281/zenodo.3894745 >
<https://github.com/dgraux/RDFStarObservatory >
• A set of KGs and queries that can be used to compare:
• Usability
• Storage size
• Query execution time
• Compliance to standards
• Support by triplestore vendors
Need for a Benchmark!
9
10. BKR - Biomedical Knowledge Repository dataset by the
U.S. National Library of Medicine.
• Used by V. Nguyen et al. (WWW 2014) to evaluate Singleton Property
vs. Reification.
• A biomedical KG containing over 30 million semantic statements
extracted from PubMed abstracts and the Unified Medical Language
System (UMLS).
10
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Reification Singleton RDF*
Triples (x106
) 175.6 100.9 61.0
RDF dump (.ttl) 11.8 GB 13.4 GB 8.0 GB
The Datasets
11. • BKR KG in 3 metadata representations:
• BKR-Reification, BKR-Singleton, BKR-RDF*
• Stored on Stardog 7.3 (natively supporting RDF*)
• Single node server, 4-cores CPU, 32GB main memory
• Goal: test differences in storage size and query execution time
Experiments
11
12. Dataset size: Actual database footprint on disk
(i.e. including triplestore indexes, data structures, etc.)
12
13. SPARQL / SPARQL* Queries
13
• “Series A & B” adapted from Nguyen et al. (WWW’14)
• Series A: 4 queries for each metadata representation
• Series B: 3 queries for each metadata representation
• Based on real use cases
• i.e. querying for provenance of biomedical statements
• Varying complexity
• From 3 to 21 triples patterns in a query (for Reification)
• Most of them not complex enough for modern triplestores/infrastructures
14. SPARQL/SPARQL*
Queries
14
Our “Series F”:
- 5 additional queries for each
metadata representation
- Requiring greater computation
than Series A & B
- Based on real use cases:
provenance and temporal
information retrieval
17. ● Considerable difference between the selected reification approaches,
even on the same triplestore (Stardog 7.3)
● Compared to Reification, Singleton shows approx. 40% reduction in
the number of triples, RDF* shows almost 70% less triples
○ However, in terms of database size, including indexes and different data
structures, the comparison shows the opposite
● Reg. query execution time:
○ RDF* was faster for complex query patterns
○ Reification was faster for simple queries
Discussion
17
18. - REF benchmark is available for the research community to test different
metadata representations
- It could be used by triplestore vendors to improve their performance,
especially for the newer RDF* solutions
- We also discovered different behaviour and internal representations between
triplestores implementing RDF*
(poster @ESWC’20: <https://github.com/dgraux/RDFStarObservatory>)+
- To liaise with practitioners and the W3C RDF* community group
- Review additional triplestores and extend with new datasets & queries
Conclusions and Future Work
18
+ Orlandi, F.; Graux, D.; O'Sullivan, D.; “How Many Stars Do You See in This Constellation?”. Posters & Demos @ ESWC 2020.
19. Thank you!
Funded from the European Union's Horizon 2020 research and
innovation programme under the Marie Skłodowska-Curie grant
agreements No. 801522 and No. 713567, by Science Foundation
Ireland and co-funded by the ADAPT Centre grant n. 13/RC/2106.
Damien Graux
Fabrizio Orlandi
{orlandif,grauxd}@tcd.ie
Trinity College Dublin, Ireland
<https://doi.org/10.5281/zenodo.3894745>
<https://github.com/dgraux/RDFStarObservatory>
Declan O’Sullivan