Quick overview of an idea for making RDF into a context logic. This idea is currently under discussion and will likely change in the near future, so please don't take it as in any way authoritative or final. Comments are welcome.
1. Blogic in the
real world
A d d i n g c o n t e x t s t o R D F
P a t H a y e s , F l o r i d a I H M C
A p r i l 2 0 1 2
Saturday, April 14, 12 1
2. Blogic
"Blogic" means the logic that actually gets used
on the Semantic Web.
This is not necessarily the way that the
formalisms are officially defined.
Saturday, April 14, 12 2
3. Blogic in the real world
Most of the actual deployed content on the
semantic web is "linked data", which is billions
of RDF triples with a few tiny sprinkles of
other, more expressive, notations.
The official logic of RDF is pretty trivial (&,∃).
But the way that it actually gets used is different
and rather more complicated.
Saturday, April 14, 12 3
4. RDF and URIs
The official view is that the 'names' in RDF, ie the URIs, are global in
scope and have fixed, eternal, referents.
This keeps the logic simple, and conforms to an idealized vision of the
Web (cf. TimBL's idea of "cool URIs"). Call this the 2004 globalist ideal.
The actual reality on the Web, however, seems to be that the meaning of
a URI might vary depending on where and when it is used. URI
referents are context-sensitive.
<can-of-worms> Note, this is about what URIs refer to when used as
logical names, not what they "identify" when used by HTTP. These are
two quite distinct ideas. Typically (not always) a URI identifies some
(source of) data about what it refers to. </can-of-worms>
Saturday, April 14, 12 4
5. RDF and SPARQL
SPARQL is the query language designed to fit with RDF.
SPARQL queries are directed to a datastore (AKA a quad store) which has
one optional 'default' RDF graph and a finite number of 'named' RDF
graphs:
{<a b c> ...}
n1 {< d e f> <g h i>}
n2 {< a d x> <g h i> ...}
n3 ...
SPARQL is now widely deployed in many real-life applications. Unfortunately,
datastores have no official semantics, so are being used in all kinds of ways.
Saturday, April 14, 12 5
6. SPARQL datastores
A datastore can be:
1. a way to name some graphs
2. a way to keep track of versions of a graph
3. a way to keep track of time-varying data (the graph "name" encodes
times)
4. a way to distinguish data from meta-data (in the default graph)
5. a way to distinguish data depending upon its provenance or source (the
graph name denotes the source)
5a. a way to distinguish data depending upon its topic (the graph name
denotes the topic)
6. a way to keep data sorted into groups which share a common meaning for
the URIs in the graphs (an "island")
7. any combination of the above; or sometimes one of the above, sometimes
another
8. various other things.
Saturday, April 14, 12 6
7. SPARQL datastores
After a huge amount of debate, discussion, argument, the RDF WG has
distilled these down to two, and the current discussion is about how to find
a sweet spot between these.
1. a way to name some graphs
6. a way to keep data sorted into groups which share a common meaning for
the URIs in the graphs (an "island")
In 1., the graph name definitely denotes the graph. In 6., it often denotes
something else. This is a problem.
Saturday, April 14, 12 7
8. SPARQL datastores
Antoine Zimmerman has suggested a model theory for datastores based
on the "island" interpretation, as follows:
In substance, this formalization says that each RDF Graph in a Dataset is interpreted separately. This models the fact that
different RDF Graphs hold in different contexts. This way, graphs that have been put in different "named graph pairs" can
contradict with each other without making the Dataset inconsistent.
Like RDF interpretations, a dataset-interpretation is relative to a vocabulary V. Moreover, dataset interpretations are defined
with respect to an entailment regime E, as defined in SPARQL 1.1 Entailment Regimes. Let KE be the set of all E-
interpretations. The interpretation of an RDF Dataset (G, (<n1>,Gn1), ..., (<nk>,Gnk)) over vocabulary V is a pair (I,Con) where I
is an E-interpretation of G (the default graph) and Con is a mapping from V to KE.
A dataset-interpretation (I,Con) of a vocabulary V wrt entailment regimùe E satisfies an RDF Dataset (G, (<n1>,Gn1), ...,
(<nk>,Gnk)) iff I E-satisfies G, and for all iin [1..k], Con(ni) exists and E-satisfies Gni.
Following standard definitions, we say that a dataset D=(G, (<n1>,Gn1), ..., (<nk>,Gnk)) entails a dataset (H, (<m1>,Hm1), ...,
(<mp>,Hmp)) iff all dataset-interpretation (I, Con) that satisfies D also satisfy H.
What this does is to treat each named graph as existing in its own local
context, with its URIs treated as different in meaning from the same URI
occurring elsewhere. Call this the graph-local vision. Nothing could be more
different from the 2004 globalist ideal.
Saturday, April 14, 12 8
9. SPARQL datastores
Sandro Hawke is running with the naming idea, and has a proposal for
distinguishing between an actual name for a graph and a mere label (ie a
URI used as a "graph name" in the datastore but not actually denoting the
graph.) This treats the labeling relationship as a functional RDF property
with the constraint that if A is a graph then (A label B) implies A=B, and
then the combination
{ <name> rdf:type rdf:Graph }
...
<name> { ... graph1... }
...
forces this particular labeling to be a genuine naming. (This allows other
labelings to not be names, which is widely used.)
Saturday, April 14, 12 9
10. Web contexts
Trying to make sense of all this leads to a vision of RDF on the Web as being a context
logic. Let me call this RDFC. RDFC extends RDF with a notion of 'web context'.
A web context represents a social agreement concerning the meaning of a vocabulary of URIs,
called the reserved vocabulary of the context. Asserting a graph in a context means that one is
a committment to use the reserved vocabulary in a way that conforms to the agreement.
The agreement may be explicit or implicit, and it may or may not be accessible in some
form from a URI used to indicate the context.
The most explicit and formal case would be a coined URI which identifies via HTTP an
RDF graph document which completely formalizes the semantic constraints of the
context, with the understanding that the URI denotes this RDF graph. Call this a graph
context. However, not all contexts can be represented as graph contexts. The other extreme
is that a URI may be used to indicate a context without any explanation or definition of
the semantic restrictions it is intended to impose. This is legal, although of limited utility.
Being a context is a role rather than a classification. Anything can be treated as a context
(just as anything in RDF can be a property or a class.) A given URI may therefore
identify one thing via HTTP, denote another thing, and be used to indicate a context, all
at the same time.
Saturday, April 14, 12 10
11. RDFC syntax
RDFC looks just like RDF, but RDF graphs are understood to always be
asserted in some context, indicated by a URI.
To assert a graph G in a context C, simply include the triple
< > rdf:inherit C .
in G. rdf:inherit is transitive, of course. ( < > means "this graph".)
If C is a graph context, this means exactly what owl:imports means now, by the
way, so it shouldn't be too revolutionary an idea :-)**
A 'bare' assertion of an RDF graph which has no rdf:inherit triple (like all
such assertions to date) is understood to be made in the default topmost
context, called rdf:, which defines the meaning of the RDF namespace.
**(Footnote) Noticing this similarity between owl:imports and Cyc's context inheritance is what led to
the current proposal.
Saturday, April 14, 12 11
12. Context inheritance
The topmost context is called rdf: and defines the RDF namespace as defined by the
2004 RDF specification documents. This is a default, so all existing RDF graphs are
understood to be asserted in it. Asserting in this context is accepting the 2004 globalist
ideal. If this were the only context, RDFC would be identical to 2004 RDF.
The other extreme is to assert a graph in itself, considered as a context. This effectively
declares all its non-reserved URIs as reserved to it, and hence separates them in
meaning from the same URIs used outside the graph. This gives Antoine's semantics
for graphs named in a SPARQL dataset, ie the graph-localist perspective on graph
meaning. One could do this using Sandro's naming trick as follows:
{ :name rdf:type rdf:Graph }
:name {:name rdf:inherits :name <other triples of the graph> }
Note that :name is the graph itself (Sandro's convention) and is also the context in
which this graph is asserted (our rule for rdf:inherits) giving the pattern
required. Note also how one can use a URI denoting a graph to also indicate a
context.
Saturday, April 14, 12 12
13. RDFC syntax
RDFC syntax requires:
1. a way to assert a graph in a context
2. a way to specify the reserved vocabulary of a context
3. a way to describe the semantic conditions imposed on the reserved vocabulary by the context.
4. a way to assert that one context inherits another
1. is done using rdf:inherits. We will assume that 4. is also described the same way.
Right now, we do not give any general formal syntax for 2. and 3., allowing users to define
their own methods, perhaps informally. (In order to be used by inference engines, an algorithm
must be provided which decides, for any URI, whether or not it is in the reserved vocabulary,
and the semantic constraint must be expressible as a determinate condition on RDF
interpretations of the reserved vocabulary. This can be done by, for example, specifying a set of
axioms and inference rules which must be valid on the interpretations, but also by a direct
mathematical description of the valid interpretations.)
In the case of a graph context, the non-reserved vocabulary of the context graph is the reserved
vocabulary of the defined context, and the semantic constraint is that the context graph be
true.
Saturday, April 14, 12 13
14. RDF:inherits
All the RDFC context structure (both inheritance between contexts and
assertion of a graph in a context) is done with the single property
rdf:inherits, and since this property is part of the rdf: namespace whose
meaning is fully determined by the rdf: top context inherited by default by all
others, its meaning cannot be changed. So RDFC does not allow 'contextual
assertions of contexts' or any other oddities. The context structure itself is
global.
Saturday, April 14, 12 14
15. RDFC model theory
An RDFC interpretation of a vocabulary V is an RDF interpretation I of V
together with a mapping con from the universe U of I to the set of RDF
interpretations over subsets of V with universes subsets of U. Define voc(x)
to be the vocabulary of con(x).
The interpretation of a URI uuu in a context ccc is defined to be
con(I(ccc))(uuu) if uuu is in voc(I(ccc)), otherwise I(uuu).
A triple
sss rdf:inherits ooo
is true in I just when voc(I(ooo)) is the restricted vocabulary specified for a
context denoted by the URI ooo, and con(I(sss)) satisfies the semantic
conditions specified for a context denoted by the URI ooo.
The remaining truth recursions for triples, graphs, blank nodes, etc. are exactly
as in the 2004 RDF model theory.
Saturday, April 14, 12 15
16. Context inheritance
In RDFC we have the globalist and localist views as extreme cases
within one framework, but we also have more useful cases. Since
users can define their own contexts and link them to other contexts
and to RDF data, new semantic conditions can be introduced, defined
and named 'in the field' without necessitating the elaborate and
expensive WG review process needed to define a new 'web standard'.
And since contexts can be published and linked to, we have a way for
the RDF/linked-data community to use URIs to refer to things in
more nuanced ways than they can at present.
For the usefulness of contexts, see Lenat's papers on the topic from
the Cyc project (this proposal is almost exactly like the microtheories
machinery implemented in CYCL, transcribed to a Web context.)
Saturday, April 14, 12 16
17. Some examples
1. Current entailment regimes (RDFS, OWL,RIF) can be viewed as contexts (and
identified using existing URIs, so we now have a realistic way to refer to them in RDF
itself), but we can also define new ones, eg the {owl:sameAs, owl:functionalProperty}
subset used in FOAF.
2. Time-dependent properties can be described as such in a context definition, which also
specifies how its subcontexts can register temporal information. (Use case 3)
3. Topics or information sources can be used as context indicators for RDF information
relevant to the topic or derived from the source. (Use cases 5 and 5a)
4. Progressively more 'refined' meanings can be indicated by contexts without inventing
new vocabulary, eg the class name :Person might mean all human beings, all living human
beings, all living American citizens in three successive subcontexts. (Lenat reports on the
usefulness of this in Cyc.)
5. Contexts provide a degree of useful referential opacity, eg an owl:sameAs asserted in one
context might cease to be true in a subcontext when more refined meanings are in use (eg
chemical elements vs. chemical isotopes)
Saturday, April 14, 12 17