- Semantic Web and huge data sources are becoming more and more popular
- Reasoning should scale well, but the whole point of DLs is to be expressive
- Different approaches to representation and to reasoning are needed
- Research is moving towards scalable reasoning for expressive logics
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Semantic Web languages: Expressivity vs scalability
1. Semantic Web languages: Expressivity vs scalability
Semantic Web languages:
Expressivity vs scalability
Nicola Vitucci
Dipartimento di Elettronica e Informazione
Politecnico di Milano
December 17, 2012
2. Semantic Web languages: Expressivity vs scalability
Summary
1 Introduction
2 Semantic Web languages
3 Description Logics
4 Queries
5 Storage
6 Conclusions
3. Semantic Web languages: Expressivity vs scalability
Introduction
Semantic Web languages
Semantic Web languages are built on the notion of Semantic
Web, an “extended version” of the Web where metadata enrich
semantically the content of a Web page
They are used in several applications for:
Building a knowledge base (a “richer” database where queries can
be performed also on the ER model itself)
Providing a shared vocabulary
Integrating different sources of information
Discovering new information by performing automatic reasoning
4. Semantic Web languages: Expressivity vs scalability
Introduction
The Semantic Web “layer cake”
Taken from http://www.w3.org/2007/Talks/0130-sb-W3CTechSemWeb/#(24)
5. Semantic Web languages: Expressivity vs scalability
Semantic Web languages
RDF, RDFS, OWL
RDF is a data model
describing resources and
their relations
RDFS provides a structure
for RDF resources
OWL (and the newer
version OWL 2) is a family
of three languages which
extend RDFS:
OWL Full
OWL DL
OWL Lite
OWL Full
RDFS
OWL DL
OWL Lite
All the languages can be serialized using formats such as
RDF/XML, N3, N-Triples or Turtle
6. Semantic Web languages: Expressivity vs scalability
Semantic Web languages
OWL 2 profiles
OWL 2 DL can be seen as a “group” of sublanguages called
profiles:
OWL 2 EL, suitable for big and
relatively simple taxonomies
OWL 2 QL, suitable for
conjunctive queries on many
instances
OWL 2 RL, a sort of
“compromise” between
expressivity and scalability
inspired by rule-based
reasoning
EL QL
RL
OWL DL
OWL Full
Recent proposal: OWL-LD (OWL for the Linked Data)
→ http://semanticweb.org/OWLLD/
7. Semantic Web languages: Expressivity vs scalability
Semantic Web languages
Too many languages?
Are there too many OWLs?
“OWL 2 is the standard,
let’s use it”
Too easy to say!
Several issues:
Complexity of reasoning
Representation needs
Queries
Storage
OWL 2 profiles have been introduced to solve such issues by
sacrificing some “power” of OWL 2
8. Semantic Web languages: Expressivity vs scalability
Description Logics
OWL and the Description Logics
The OWL (2) DL language belongs to the family of description
logics (DLs). A description logic:
is a family of logics (in the math sense)
is more powerful than propositional logic
is less powerful than First Order Logic (FOL) but decidable
has a formal semantics which allows to build ontologies and to
reason over them
9. Semantic Web languages: Expressivity vs scalability
Description Logics
Key concepts in DLs
The key elements have to be thought within the framework of
set theory
Individuals (single elements)
Concepts: sets of instances
Roles: relations between instances
Terminology is expressed through TBox axioms such as
Researcher Employee
ResearchCompany ≡ Company ∃hasEmployee.Researcher
Factual information about individuals is represented by ABox
axioms such as:
a : C (concept assertion)
(a, b) : R (role assertion)
10. Semantic Web languages: Expressivity vs scalability
Description Logics
Basic DLs
Several basic DLs exist, among which:
AL: provides atomic concept negation (¬C, where C is an atomic
concept), concept intersection (C D), universal restrictions
(∀R.C) and limited existential quantification (∃R. )
EL: provides concept intersection and full existential quantification
(∃R.C)
Such logics can be extended by the use of several constructs (see
next slide)
11. Semantic Web languages: Expressivity vs scalability
Description Logics
Constructs
Symbol meaning
E full existential quantification
U concept union (C D)
C complex concept negation (¬D); includes U and E
H role hierarchy (R S, where R and S are roles)
R inverse roles, intersection and union of roles etc., reflexivity and
irreflexivity, role disjointness; includes H
O nominals (Letter ≡ {a, b, c}, RedObject ≡ hasColor.{red})
I inverse properties (S ≡ R−
)
F functional properties
N cardinality restrictions (C ≡ nR with n 0); includes F
Q qualified cardinality restrictions (e.g. C ≡ nR.D with n 0);
includes N
(D) datatype properties (e.g. strings, numbers etc.)
S is an alias for ALC+
(ALC with transitive roles), EL++
for ELRO
12. Semantic Web languages: Expressivity vs scalability
Description Logics
Complexity
The complexity of a DL depends on the constructs it supports
OWL 1 Lite = SHIF(D) (restricted)
OWL 1 DL = SHOIN(D)
OWL 2 DL = SROIQ(D)
OWL 2 EL is based on EL++
OWL 2 QL is based on DL-Lite, a subset of ALC using
optionally H, F, N
OWL 2 RL is based on Description Logic Programs (DLP),
sharing many features with OWL Lite
How complex are the reasoning tasks then?
13. Semantic Web languages: Expressivity vs scalability
Description Logics
Complexity
The complexity of reasoning tasks depends not only on the
presence of some constructs in the used logic, but also on their
combination:
ALCQI, ALCQO: PSpace
ALCIO: ExpTime (I and O together raise the complexity)
ALCQIO, SHOIN, SHOIQ: NExpTime (I + O + N/Q)
SROIQ: N2ExpTime
Thus, care should be taken when considering the constructs which
are really needed for one’s application
More on complexity of reasoning in description logics:
http://www.cs.man.ac.uk/~ezolin/dl/
14. Semantic Web languages: Expressivity vs scalability
Description Logics
Complexity
Language Reasoning problems1
Complexity2
OWL 2 DL
Cons, Sat, Sub, Check 2NExpTime-Complete
Query ???
OWL 2 EL
Cons, Sat, Sub, Check PTime-Complete
Query ExpTime-Complete
OWL 2 QL
Cons, Sat, Sub, Check NLogSpace-Complete
Query NP-Complete
OWL 2 RL
Cons PTime-Complete
Sat, Sub, Check co-NP-Complete
Query NP-Complete
1
Ontology Consistency, Class Expression Satisfiability, Class Expression
Subsumption, Instance Checking, Conjunctive Query Answering
2
More about complexity on http://www.w3.org/TR/owl2-profiles/
#Computational_Properties
15. Semantic Web languages: Expressivity vs scalability
Description Logics
Sources of complexity
Sources of complexity for a DL include:
Non-determinism: disjunction (or negation and conjunction),
maximum cardinality restrictions
Exponential complexity: combination of ∃ and ∀
For this reason, all the OWL 2 profiles disallow or restrict the use
of such constructs (see next slide)
16. Semantic Web languages: Expressivity vs scalability
Description Logics
Use of profiles
Intersection: always allowed but on the left side in OWL 2 QL;
Union: never allowed but on the left side in OWL 2 RL
(although A B C is the same as A C, B C , so this
does not add up to the complexity);
Negation: allowed only on the right side in OWL 2 RL/QL;
Inverses: allowed in OWL 2 RL/QL but not in OWL 2 EL;
Existential quantifiers: allowed completely in OWL 2 EL, with
restrictions on the left side in OWL 2 QL, only on the left side in
OWL 2 RL;
Universal quantifiers: allowed in OWL 2 RL (on the right side)
but not in OWL 2 EL/QL.
17. Semantic Web languages: Expressivity vs scalability
Description Logics
The EL profile
No inverse or symmetric properties, disjunctions, negations
The EL profile is suitable for biomedical ontologies such as
SNOMED
Example axiom:
ViralUpperRespiratoryTractInfection ≡
UpperRespiratoryInfection ViralRespiratoryInfection
∃CausativeAgent.Virus
∃FindingSite.UpperRespiratoryTractStructure
∃PathologicalProcess.InfectiousProcess
Suitable reasoners: Snorocket, CEL, jCEL, ELK
Often individuals are not supported
Queries are reasoner-based
18. Semantic Web languages: Expressivity vs scalability
Description Logics
The RL profile
Inference as a set of rules
Has universal quantifiers, inverses, (a)symmetric properties
Constructs are restricted on the two sides of a subclass axiom
This plays a role in inference
D ∃R.C (not allowed) is different from ∃R.C D (allowed)
thus, equivalences such as D ≡ ∃R.C are not allowed
Suitable reasoners: OWLIM, Jena
Queries can be performed on the model or on the instances
19. Semantic Web languages: Expressivity vs scalability
Description Logics
Representation needs
Value partition
Can use nominals instead of classes, but this would require the O
constructor and would prevent further partitions
“An object can be long, medium or short”
Object ∃hasLength.Length
Length ≡ Long Medium Short (all subclasses are disjoint)
N-ary (object or datatype) properties
“A ball is painted with a color by a certain percentage”
Painting ∃color.Color ∃percentage.Percentage
hasPainting ◦ color hasColor
hasPainting ◦ percentage hasPercentage
20. Semantic Web languages: Expressivity vs scalability
Description Logics
Representation needs
Exceptions
“Birds have feathers and fly, penguins are birds but they don’t fly”
Bird ∃hasFeathers
FlyingBird Bird ∃hasAbility.Fly
NonFlyingBird Bird ¬∃hasAbility.Fly
Some of these situations can be modeled using Ontology Design
Patterns (ODPs), but it is necessary to assess the required
expressivity
21. Semantic Web languages: Expressivity vs scalability
Description Logics
Fuzzy extensions
“The world is not black or white”
“How old is an adult?”
“A basketball has to be round and orange:
what is more important?”
Fuzzy extensions and weighted axioms
require a higher expressivity
Adult ∃age.right-shoulder(0,100,20,40)
Basketball ≡ Round0.75 Orange0.25
Available reasoners:
fuzzyDL (f-SHIF + other fuzzy constructs)
FiRE (f-SHIN)
There is no standard yet
22. Semantic Web languages: Expressivity vs scalability
Queries
Querying a knowledge base
Conjunctive query answering is “non-standard” reasoning
SPARQL queries:
work on ABox and TBox
are not always supported
over entailments
allow for a weak form of
closed world assumption
scale well on big knowledge
bases
are low-level and difficult to
use for TBox queries
DL queries:
are limited to the TBox
are not always supported by
all reasoners
do not allow for closed
world negation
can be slow when reasoning
with many individuals
are easy to write and
interpret
SPARQL-DL/SPARQL-OWL queries:
“bridge” between the two approaches
are not (yet) a W3C standard
do not have “industrial” strength (are still experimental)
23. Semantic Web languages: Expressivity vs scalability
Queries
SPARQL queries
Queries on instances are very flexible due to the power of the
SPARQL language, which in the 1.1 version supports:
Property paths
Aggregates (COUNT, SUM, MIN, MAX, AVG)
Subqueries
Updates
A weak form of CWA (using MINUS and NOT EXISTS)
24. Semantic Web languages: Expressivity vs scalability
Queries
DL queries
On the contrary, DL queries in SPARQL are complicated
Example:
“Find C where ∃hasShape.Round C”
PREFIX [...]
SELECT DISTINCT ?q
WHERE {
?x rdfs:subClassOf ?q ;
a owl:Restriction ;
owl:onProperty :hasShape ;
owl:someValuesFrom :Round
}
25. Semantic Web languages: Expressivity vs scalability
Queries
Inference in big KBs
Support for reasoning is needed if the used language is not only
RDFS
Some inference can be performed using SPARQL itself (e.g.
class hierarchy using property paths)
If a more expressive language is used, two choices:
a reasoner makes inferred data available
a reasoner rewrites the query in order to incorporate the ontology
When the knowledge base is big, several strategies can be used:
Query approximation
Theory approximation
Ontology modularization
26. Semantic Web languages: Expressivity vs scalability
Storage
Storage
When an knowledge base is big, suitable storage solutions are
needed
“Traditional” approaches use single files for every ontology
(reasoning is performed on in-memory models)
Triple (or quad) stores can store and retrieve many triples
efficiently
OWLIM (OWLIM-Lite, OWLIM-SE, OWLIM-Enterprise)
Jena (TDB, SDB, with PostgreSQL)
Sesame
AllegroGraph
OpenLink Virtuoso
Dydra (storage in the cloud)
Is inference performed? How?
Custom engines vs existing engines
Rule-based engines: forward chaining vs backward chaining
27. Semantic Web languages: Expressivity vs scalability
Storage
OWLIM
Website: http://owlim.ontotext.com
Family of three semantic repositories of industrial strength
Uses a rule engine supporting RDFS, OWL-Horst, OWL 2 QL,
OWL 2 RL
Supports the full SPARQL 1.1 (+ Update)
VERY scalable: the Lite (free) version scales up to tens of
millions of triples
28. Semantic Web languages: Expressivity vs scalability
Storage
Dydra
Website: http://dydra.com
Software as a Service (SaaS) with proprietary implementation
Quad store, no reasoning, supports most of SPARQL 1.1
Can try the SPARQL endpoints (w/ and w/o inferences):
http://dydra.com/nick/milantransport/sparql
http://dydra.com/nick/milantransport_inf/sparql
PREFIX : <http://www.semanticweb.org/owlapi/
ontologies/MilanTransportOntology#>
SELECT DISTINCT ?n
WHERE {?f :name "S.BABILA" .
?f :connected{2} ?t .
?t :name ?n
FILTER(?t != ?f)}
29. Semantic Web languages: Expressivity vs scalability
Conclusions
Conclusions
Semantic Web and huge data sources are becoming more and
more popular
Reasoning should scale well, but the whole point of DLs is to be
expressive
Different approaches to representation and to reasoning are
needed
Research is moving towards scalable reasoning for expressive
logics
30. Semantic Web languages: Expressivity vs scalability
Conclusions
Conclusions
The (sub)language, the storage model, the inference engine and
the query language have to be chosen as a whole
Reasoners for expressive languages make it appealing to use
their own APIs for queries and are currently most used for query
rewriting, but may not scale well with many data
Native storage can be extremely scalable for big ABoxes and
makes it possible to use standard query languages such as
SPARQL, but such use is complex and the supported
(sub)languages are less expressive than the full OWL
The level of expressivity and the expected scale should be
assessed beforehand