6. What is Contextualized Knowledge Graph?
10/25/2018 6
A contextualized knowledge graph is a knowledge graph in which
every fact is qualified with a set of contextual properties.
7. Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Bob Dylan marriedTo Carolyn Dennis 1986-06-## 1992-10-##
Motivation Scenario
Facts:
Meta Queries:
Query type Sample query
Provenance P1. Where is this fact from?
P2. When was it created?
P3. Who created this fact?
Time T1. When did this fact occur?
T2. What is the time span of this fact?
T3. Which events happened in the same year?
Location L1. What is the location associated with this fact?
L2. Which events happened at the same place?
Certainty C1. What is the author confidence of this fact?
7
Subject Predicate Object
Bob Dylan marriedTo Sarah Lownds
Bob Dylan marriedTo Carolyn Dennis
9. 9
2973 datasets with 149 billion triples
Linked Data principles
Use URIs as names
Use HTTP URLs to be looked up
URI provides useful info using
standard
Include links to other URIs to
discover more
11. Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF Reification
Form of Triples: RDF Reification
Pros:
1. Intuitive, easy to understand
Cons:
1. Takes 3N triples (4N if including
Statement typing) to represent a
statement => Not scalable
2. No formal semantics defined =>
Semantics is unclear
3. Discouraged in LOD!
Time-aware Facts:
11
Subject Predicate Object
#stmt1 type Statement
#stmt1 hasSubject BobDylan
#stmt1 hasProperty marriedTo
#stmt1 hasObject Sara Lownds
Bob Dylan marriedTo Sarah Lownds
#stmt1 starts 1965-11-22
#stmt1 ends 1977-06-29
12. Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
RDF Reification
RDF Reification vs. Singleton Property
Time-aware Facts:
Subject Predicate Object
#stmt1 type Statement
#stmt1 hasSubject BobDylan
#stmt1 hasProperty marriedTo
#stmt1 hasObject Sara Lownds
Bob Dylan marriedTo Sarah Lownds
#stmt1 starts 1965-11-22
#stmt1 ends 1977-06-29
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
BobDylan marriedTo#1 Sarah Lownds
marriedTo#1 starts 1965-11-22
marriedTo#1 ends 1977-06-29
Singleton Property
12
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements
using singleton property." In Proceedings of the 23rd international conference on World wide web, pp. 759-770. ACM,
2014.
13. Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Form of Triples: PaCE
Pros:
1. Save ~50% number of triples
compared to reification thanks
to the repeated subject,
predicate, and object.
Cons:
1. Not intuitive, hard to
understand
2. Limited expressiveness
Provenance-aware Facts:
13
Provenance-aware Context Entity
Subject Predicate Object
BobDylan_wp rdf:type Bob Dylan
SaraLownds_wp rdf:type Sara Lownds
BobDylan_wp marriedTo SaraLownds_wp
BobDylan_wp hasSource wiki:Bob_Dylan
BobDylan_wp hasDateExt 2009-06-07
Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth, and Krishnaprasad Thirunarayan. 2010. Provenance
context entity (PaCE): scalable provenance tracking for scientific RDF data. In Proceedings of the 22nd international
conference on Scientific and statistical database management (SSDBM'10),
14. Subject Predicate Object Source DateExtracted
Bob Dylan marriedTo Sarah Lownds wikipage:Bob_Dylan 2009-06-07
Provenance-aware Context Entity
Subject Predicate Object
BobDylan_wp rdf:type Bob Dylan
SaraLownds_wp rdf:type Sara Lownds
BobDylan_wp marriedTo SaraLownds_wp
BobDylan_wp hasSource wiki:Bob_Dylan
BobDylan_wp hasDateExt 2009-06-07
Facts and Provenance:
14
PaCE vs. Singleton Property
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
BobDylan marriedTo#1 Sarah Lownds
marriedTo#1 hasSource wp:Bob_Dylan
marriedTo#1 hasDateExt 2009-06-07
Singleton Property
15. Form of Quadruples: Named Graph
Pros:
1. Intuitive --creating # named graphs
for # sources
2. Attach metadata for a set of triples
3. SPARQL supported
Cons:
1. Defined for provenance only
2. Ambiguous semantics while
associating different types of
metadata at triple level
Time-aware Facts:
* Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
15
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Named Graph
Subject Predicate Object NG
Bob Dylan marriedTo Sarah Lownds ng_1
ng_1 starts 1965-11-22 Prov_graph
ng_2 ends 1977-06-29 Prov_graph
16. Named Graph
Subject Predicate Object NG
Bob Dylan marriedTo Sarah Lownds ng_1
ng_1 starts 1965-11-22 Prov_graph
ng_2 ends 1977-06-29 Prov_graph
Time-aware Facts:
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
Named Graph vs. Singleton Property
Subject Predicate Object
marriedTo#1 rdf:sp marriedTo
Bob Dylan marriedTo#1 Sarah Lownds
marriedTo#1 starts 1965-11-22
marriedTo#1 ends 1977-06-29 16
Singleton Property
17. RDF+:
Subject Predicate Object Meta Property Meta value
Bob Dylan marriedTo Sarah Lownds starts 1965-11-22
Bob Dylan marriedTo Sarah Lownds ends 1977-06-29
Form of Quintuples: RDF+
Cons:
1. The representation is not in the form of RDF. Statement identifiers are used
internally. Require the mappings from RDF to RDF+ and vice versa.
2. The SPARQL query syntax and semantics need to be extended to support RDF+
Facts and Temporal Information:
* Dividino, Renata, et al. "Querying for provenance, trust, uncertainty and other meta knowledge in RDF." Web
Semantics: Science, Services and Agents on the World Wide Web 7.3 (2009): 204-219.
17
Subject Predicate Object Starts Ends
Bob Dylan marriedTo Sarah Lownds 1965-11-22 1977-06-29
18. Experiment: BKR with Provenance
All datasets are available at http://wiki.knoesis.org/index.php/Singleton_Property 20
• Five data sets generated from the same seed BKR
Singleton Property (SP)
Reification (R)
PaCE C1 (C1)
PaCE C2 (C2)
PaCE C3 (C3)
20. • Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit
Sheth, Olivier Bodenreider, Michel Dumontier. Exposing provenance metadata using
different RDF models. In Proceedings of Semantic Web Applications and Tools for
Life Science (SWAT4LS), 2016.
https://pubchem.ncbi.nlm.nih.gov/
• Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works
well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
• Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther
Vidal. "Evaluation of Metadata Representations in RDF stores.”
• Daniel Hernández, Aidan Hogan, Cristian Riveros, Carlos Rojas, Enzo Zerega:
Querying Wikidata: Comparing SPARQL, Relational and Graph Databases.
International Semantic Web Conference (2) 2016: 88-103
22
External Evaluation
21. Subject Predicate Object Source FromDataset Confidence
CID5280961(Genistein) inhibits GID2100(ESR2) PMID12502307 ChemBL
CID5757(Estradiol) activates GID2100(ESR2) PMID19128016 ChemBL
10/25/2018
Exposing provenance metadata using different RDF models
Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier
22. Model I Model II Model III Model IV Model V
22,787,218 21,445,348 19,575,298 17,239,427 27,605,782
24
PubChem
• Five data sets generated from the same seed
N-ary with cardinal assertion (Model I)
N-ary without cardinal assertion (Model II)
Singleton property with cardinal assertion (Model III)
Singleton property without cardinal assertion (Model IV)
NanoPublication (Model V)
• Comparing sizes of generated datasets
SP datasets are the most compact ones
Gang Fu, Evan Bolton, Núria Queralt Rosinach, Laura I Furlong, Vinh Nguyen, Amit Sheth, Olivier
Bodenreider, Michel Dumontier. Exposing provenance metadata using different RDF models. In
Proceedings of Semantic Web Applications and Tools for Life Science (SWAT4LS), 2016.
25. 27
WikiData
• Four data sets generated from the same seed
Standard Reification (SR)
N-ary relation (NR)
Singleton property (SP)
Named Graph (NG)
• Comparing sizes of generated datasets
SP dataset is the most compact one
Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with
wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
26. 28
WikiData
• Query performance in 4store and GraphDB
SP models are not supported by 4store and GraphDB
• Query performance in Virtuoso and BlazeGraph
Reification and NG are well-supported by Virtuoso and
BlazeGraph
SP is little faster than NR in Virtuoso, slower in BlazeGraph
27. 29
WikiData
• Six data sets generated from the same seed
Standard Reification (stdreif)
N-ary relation (naryrel)
Singleton property (sgprop)
Companion property (cpprop)
Named Graph (ngraphs)
RDF* (rdr)
• Comparing sizes of generated datasets
SP dataset is the most compact triple representation
Fastest in loading time for WikiData
Best query performance for StarDog in all cases
Slowest in Virtuoso but not by much for WikiData queries
Not encounter performance issues with SP
Frey, Johannes, Kay Müller, Sebastian Hellmann, Erhard Rahm, and Maria-Esther Vidal. "Evaluation of
Metadata Representations in RDF stores."
28. 30
Experimental Comparison
• Dataset size
SP offers the most concise representation in all cases
• Query performance
SP performs reasonably well in Virtuoso, best in StarDog, OK in
BlazeGraph
SP may have the potential for the performance gain if
supported and optimized by the query engines
Is SP representation optimal?
34. 10/25/2018 36
Current PubChem Neighbor
• Number of links
92,000,000 * 92,000,000 / 2 = 4.232 * 10^15
4 quadrillion
• Challenges
⨯ Number of triples increases to quadrillion
⨯ SPARQL query processing for Quadrillion triples
• Is it worth?
Chemical similarity is one of the most important concept in
chemoinformatics
Similar compounds have similar properties
Semantic Web Technology, enhanced by a massive use of open linked data, plays a crucial role in the overall Deep QA architecture
CEO Sundar Pichai led the charge here, noting that Google's Knowledge Graph (the easily accessible information that pop up under the search bar for certain queries) now encompasses 70 billion facts
1163 datasets
Using Semantic Web technologies
149,423,660,620 triples from 2973 datasets (retrieved Dec 14)
Use URIs as names for things
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
Include links to other URIs. so that they can discover more things.
Five datasets
One slide shows the graph database approach
One slide compares the SP and property graph
One slide shows the schema
One slide shows similarity score file
One slide shows the numbers in the schema
One slides show the numbers for all approaches