I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWC
1. Fueling the future with
Semantic Web Patterns
Valentina Presutti!
STLab Institute of Cognitive Sciences and Technologies, CNR, Rome (IT)!
!
WOP 2014, October 19th, Riva del Garda (IT)!
2. Outline
• Can we implement the original Semantic Web scenario?
• Knowledge sources heterogeneity problem
• Semantic alignment at pattern level
• Knowledge Patterns as key elements
• Some STLab results on KP-based knowledge extraction
• A possible research direction to pattern alignment
2
• Conclusion
3. What’s the message?
Knowledge Patterns are a wormhole in
the Web to knowledge interpretation and
understanding
3
4. We all want a Personal Assistant Robot!
Answering our
Giving opinion questions
on facts and
things Providing
guidelines for
procedures
Solving our
problems Planning and
reminding our
schedule
WOODY 4
5. WOODY
“Pete and Lucy could use their agents to carry
out all these tasks thanks not to the World Wide
Web of today but rather the Semantic Web that
it will evolve into tomorrow.”
–Tim Berners-Lee, James Hendler and Ora Lassila, 2001
5
6. Today is 13 years later
How would we implement it? 6
8. Background knowledge
We want WOODY to read and understand
background knowledge and use it in a smart way
Heterogeneity
!
Structured and Unstructured data
Syntactic and Semantic introperability
8
9. Heterogeneity
Syntactic interoperability
• To unify the format of
knowledge sources
enabling e.g. distributed
query
Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the
Semantic Web, Morgan & Claypool Publishers 2011
10. Semantic interoperability
• Making sense of distributed
data
• Enabling their automatic
interpretation
• Different semantic
perspectives must be
addressed
10
Heterogeneity
11. Semantic interoperability
An ontology is a formal
specification of a shared
conceptualisation
11
Heterogeneity
This definition is valid for any Semantic Web
knowledge resource
12. Semantic interoperability:
formal specification
• Shared knowledge
representation language
• Semantic interoperability to
the extent of its formal
semantics
12
rdfs:subClassOf
owl:sameAs
rdfs:subPropertyOf
owl:equivalentProperty
owl:equivalentClass
13. Semantic interoperability:
conceptualisation
• We have to cope with
knowledge sources
conceptualisations
• Aligning knowledge sources
at a conceptual level
formal specification
13
knowledge representation
cognition
conceptualisation
15. Semantic alignment 1+2+3
• One-by-one alignment
of classes, properties
and individuals
Xianpei Han, Le Sun, Jun Zhao: Collective entity linking in web text: a graph-based method, Proceedings of SIGIR 2011, ACM.
Euzenat, Jérôme, Shvaiko, Pavel: Ontology Matching 2nd ed. 2013, Springer.
16. Semantic alignment 1+2+3
• Alignment to foundational
theories, e.g. DOLCE
• They provide a universal
reference framework from
which to derive all
possible consequences,
inferences, errors.
• Assumption: foundational
theory axioms always hold
dul:Agent!
dul:NaturalPerson
Daniel Oberle et al., DOLCE ergo SUMO: On foundational and domain models in the SmartWeb Integrated Ontology (SWIntO). J. Web Sem. 5(3): 156-174 (2007)
Aldo Gangemi, Nicola Guarino, Claudio Masolo, Alessandro Oltramari, Luc Schneider: Sweetening Ontologies with DOLCE. EKAW 2002: 166-181
Prateek Jain et al.: Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton
Smith B, Rosse C.: The role of foundational relations in the alignment of biomedical ontologies. Stud Health Technol Inform. 2004;107(Pt 1):444-8
17. Semantic alignment 1+2+3
• They provide a decontextualized view on data
• It is not enough for contextualized interoperability:
making sense of data for a certain interactive/
cognitive task
17
Alignment one-by-one
Alignment to
foundational theories
18. Imagine we are interested in comparing the governors of California based
on the laws they created.
18
19. Imagine we are interested in comparing the governors of California based
on the laws they created.
18
one-by-one
one-by-one
one-by-one
one-by-one
one-by-one
one-by-one
20. Imagine we are interested in comparing the governors of California based
on the laws they created.
In order to select the information that are relevant for performing our task we need to
extract only those facts that are framed by certain political concepts and relations.
18
one-by-one
one-by-one
one-by-one
one-by-one
one-by-one
one-by-one
21. The boundary problem
ex:law_dp_CA_2010 rdf:type ex:Law
ex:law_dp_CA_2010 ex:creator dbpedia:Arnold_Schwarzenegger
ex:law_dp_CA_2010 ex:jurisdiction dbpedia:California
ex:law_dp_CA_2010 ex:name ex:drug_policy_CA_2010
ex:law_dp_CA_2010 ex:creationTime ^^xsd:date:2010
ex:law_dp_CA_2010 ex:forbidden “marijuana possession of up to one ounce”
lmdb:Terminator rdf:type lmdb:film
lmdb:Terminator lmdb:actor dbpedia:Arnold_Schwarzenegger
lmdb:Terminator lmdb:date ^^xsd:date:1984
lmdb:Terminator lmdb:directordbpedia:James_Cameron
lmdb:Terminator lmdb:sequel dbpedia:Terminator_2
dbpedia:Arnold_Schwarzenegger rdf:type dbpedia-owl:Office_Holder
dbpedia:Arnold_Schwarzenegger dbpprop:predecessor dbpedia:Lee_Haney
dbpedia:California_foie_gras_law dbpprop:governor dbpedia:Arnold_Schwarzenegger
Aldo Gangemi, Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
22. The boundary problem
ex:law_dp_CA_2010 rdf:type ex:Law
ex:law_dp_CA_2010 ex:creator dbpedia:Arnold_Schwarzenegger
ex:law_dp_CA_2010 ex:jurisdiction dbpedia:California
ex:law_dp_CA_2010 ex:name ex:drug_policy_CA_2010
ex:law_dp_CA_2010 ex:creationTime ^^xsd:date:2010
ex:law_dp_CA_2010 ex:forbidden “marijuana possession of up to one ounce”
similar
lmdb:Terminator rdf:type lmdb:film
lmdb:Terminator lmdb:actor dbpedia:Arnold_Schwarzenegger
lmdb:Terminator lmdb:date ^^xsd:date:1984
lmdb:Terminator lmdb:directordbpedia:James_Cameron
lmdb:Terminator lmdb:sequel dbpedia:Terminator_2
dbpedia:Arnold_Schwarzenegger rdf:type dbpedia-owl:Office_Holder
dbpedia:Arnold_Schwarzenegger dbpprop:predecessor dbpedia:Lee_Haney
dbpedia:California_foie_gras_law dbpprop:governor dbpedia:Arnold_Schwarzenegger
Aldo Gangemi, Valentina Presutti: Towards a pattern science for the Semantic Web. Semantic Web 1(1-2): 61-68 (2010)
23. Semantic alignment 1+2+3
• We need interoperability at the level of groups of
relations that together identify specific
interpretational contexts!
• We need local reference theories defining
conceptual boundaries -> Knowledge Patterns*
20 *(cf. Gangemi&Presutti, 2010)
26. Top-down resources
• Linguistic resources: FrameNet,
VerbNet, Corpus Pattern Analysis
• Ontology Design Patterns
(Content Patterns)
• EarthCube content patterns
• Component Library
• Cyc micro theories
• Data model patterns (David C.
Hay)
• Infobox templates, microformats
23
All of them define patterns that
provide conceptual context for
representing data
27. Knowledge extraction
methods
• Entity Linking based on
key discovery (almost-key
discovery*)
• Data/graph mining:
frequent itemset/
subgraphs, anomalies
• NLP: frame detection,
event extraction
* Danai Symeonidou: Automatic key discovery for Data Linking, PhD Thesis, 2014.
24
They all mine data looking
for patterns that allow to
make sense of it.
28. KP hypothesis
Independently of the specific data structure or
knowledge representation format, certain patterns
share a same intensional meaning
25
29. Three heterogeneous knowledge sources (different data structures, different format),
but sharing the same intensional meaning i.e. describing a cooking situation
26
30. Three heterogeneous knowledge sources (different data structures, different format),
but sharing the same intensional meaning i.e. describing a cooking situation
26
Knowledge
Pattern
31. Three heterogeneous knowledge sources (different data structures, different format), but
sharing the same intensional meaning i.e. modelling of a cooking situation
27
32. Three heterogeneous knowledge sources (different data structures, different format), but
sharing the same intensional meaning i.e. modelling of a cooking situation
27
Knowledge
Pattern
33. Cognitive foundations of KPs
• People tend to remember items that fit into a
schema (cf. Bartlett and a lot of CS from then)
• In particular, schemas that are associated with
some functional similarity (cf. Gibson’s
affordances)
• Schema similar to (conceptual) frame, script,
knowledge pattern
28
34. How to represent KPs
• Class or property punning (with KP description)
• Property domain/range axiom punning (with KP roles)
• Typed named graphs
• OWL ontology modules (cf. ODP)
• SPARQL query patterns, SPIN patterns
• hasKey patterns
29
36. Pattern alignment
31
Investigating the
application of similarity
measures to complex
structures
vector spaces, graph
matching, structure
matching, etc.
37. Pattern alignment
• Network alignment (cf.
Roded Sharan*)
!
• Modular structure of
conserved clusters among
yeast, worm, and fly
!
• Multiple network alignment
revealed 183 conserved
clusters.
*Roded Sharan et al.: Conserved patterns of protein interaction in multiple species, Pnas, 2005.
32
41. Schema induction of linked datasets based on patterns.
Patterns are built around central concepts and used for automatic design of SPARQL queries
Centrality discovery in datasets
mo:Track
mo:track
mo:MusicArtist
mo:Playlist
mo:Torrent
tags:taggedWithTag
tags:Tag
mo:Record
foaf:maker
mo:image dc:date
rdfs:Literal
dc:title
dc:description
mo:available_as
mo:available_as
mo:available_as
Valentina Presutti, Lora Aroyo, Alessandro Adamou, Balthasar
Schopman, Aldo Gangemi, Guus Schreiber: Extracting Core
Knowledge from Linked Data. COLD2011, CEUR-WS.org Vol-782.
36
42. Encyclopedic Knowledge
Patterns: example
• An Encyclopedic Knowledge Pattern (EKP) is discovered from the
paths emerging from Wikipedia page link structure
• They are represented as OWL2 ontologies
Andrea Giovanni Nuzzolese, Aldo Gangemi, Valentina Presutti, Paolo Ciancarini: Encyclopedic Knowledge
Patterns from Wikipedia Links. International Semantic Web Conference (1) 2011: 520-536
37
43. Using Encyclopedic Knolwedge Patterns for browsing Wikipedia
Serendipity in exploratory browsing
http://www.aemoo.org
Andrea Giovanni Nuzzolese, Valentina Presutti, Aldo Gangemi, Alberto Musetti, Paolo
Ciancarini: Aemoo: exploring knowledge on the web. WebSci 2013: 272-275
Aemoo: exploratory search based on EKP - Semantic Web
Challenge @ISWC 2011 – Short listed, 4th place
38
44. KP-based machine reading with FRED
39
http://wit.istc.cnr.it/stlab-tools/fred/
Valentina Presutti, Francesco Draicchio, Aldo Gangemi: Knowledge Extraction Based on
Discourse Representation Theory and Linguistic Frames. EKAW 2012: 114-129
45. KP-based machine reading with FRED
http://wit.istc.cnr.it/stlab-tools/fred/
The New York Times reported that John McCarthy
died. He invented the programming language LISP.
From natural language to linked data graphs, which are
designed including event- and frame-based patterns
40
46. Relation discovery and property generation
http://wit.istc.cnr.it/kore-dev/legalo
41
f-measure=.83
Exploiting event- and frame-based
patterns for relation discovery
Valentina Presutti et al. Uncovering the semantics of
Wikipedia pagelinks. EKAW 2014.
47. Overimposing sentic frames on event- and frame-based linked
data graphs representing opinions, for sentiment analysis
Sentic frames from text
http://wit.istc.cnr.it/stlab-tools/sentilo
42
48. Overimposing sentic frames on event- and frame-based linked
data graphs representing opinions, for sentiment analysis
Sentic frames from text
http://wit.istc.cnr.it/stlab-tools/sentilo
42
49. Overimposing sentic frames on event- and frame-based linked
data graphs representing opinions, for sentiment analysis
Sentic frames from text
http://wit.istc.cnr.it/stlab-tools/sentilo
42
50. • Hybridisation is the common factor of these
methods
• Still far from solving the pattern alignment problem
• KP-based design of knowledge sources can
support easier procedure for pattern alignment
43
52. KP hypothesis
45
Independently of the
specific data structure or
knowledge representation
format, certain patterns
share a same intensional
meaning
53. Building a KP distributed system
Event extraction Events
46
Ontology Matching
Social Network
Analysis
Frame detection
Leveraging different techniques
for knowledge extraction
Data Mining
Graph Mining
Rules
Correspondence
patterns
Unusual records
Frames
Association rules
Frequent
subgraphs
Anomalies
Frequent itemset
Unifying their results by
representing them as KPs
KP distributed system
The KP system starts with potentially approximate and incomplete
patterns and evolves to become more and more robust and
accurate thanks to continuous feedback
54. Knowledge pattern system
• Inspired by Minsky’s
frame-systems
• Statistical methods
can help to identify
relations between
KPs:
• co-occurrence,
causality,
triggering, etc.
47
KPs
KPs
KPs
KPs
KPs
KPs
KPs
55. Knowledge pattern system
• Inspired by Minsky’s
frame-systems
• Statistical methods
can help to identify
relations between
KPs:
• co-occurrence,
causality,
triggering, etc.
47
KPs
KPs
KPs
KPs
KPs
KPs
KPs
56. A reviewing complaint case
• Imagine someone gets a paper rejection …
• … and comments on Facebook …
57. If we want to enable smart reasoning on
heterogeneous sources we need a way to relate data
like this paper’s review with this FB status
58. KP entailment
E.g. Patrick Pantel’s “Verb Ocean”
reject [can-result-in] argue :: 11.634112
fn:Respond_to_proposal vo:can-result-in fn:Quarreling
59. reject ⊑ Respond_to_proposal argue ⊑ Quarreling
x ∈ Interlocutor.respond_to_proposal
y ∈ Speaker.respond_to_proposal
z ∈ Proposal.respond_to_proposal
k ∈ Arguer1.quarreling
m ∈ Arguer2.quarreling
n ∈ Issue.quarreling
=
=
≈
⊢
reject(r,x,y,z,…) entails argue(s,k,m,n,…)
60. However…
• Automatic methods
are never 100%
accurate
• Regularities can
emerge for statistical
significance even if
they are not relevant
• We need procedure
and metrics for
validating KPs
http://tylervigen.com/
52
61. Patterns vs KP
• A pattern is a motivated structure that is proposed
by experts or emerges from inductive methods
• A KP formalises the intensional description of a
class of situations, events, cases, etc.
• When a proposed or emerging pattern is a KP?
• Real data are dirty: spurious correlations
• How to single out spurious ones?
62. “Human is the measure of all things.”
–Protagoras, ~450 B.C.
54
63. We need humans in the cycle
55
K KP
K
K
K
K
K
Correspondence
patterns
Unusual records
Frames
Association rules
Frequent
subgraphs
Anomalies
Frequent itemset
Events
Ontology Matching
Social Network
Analysis
Frame detection
Data Mining
Graph Mining
Rules
Event extraction
Crowdsourcing
methods
64. We need humans in the cycle
55
K KP
K
K
K
K
K
Correspondence
patterns
Unusual records
Frames
Association rules
Frequent
subgraphs
Anomalies
Frequent itemset
Events
Ontology Matching
Social Network
Analysis
Frame detection
Data Mining
Graph Mining
Rules
Event extraction
Crowdsourcing
methods
Marco Fossati, Claudio Giuliano, Sara Tonelli: Outsourcing
FrameNet to the Crowd. ACL (2) 2013: 742-747
VideoGames with a purpose applied to semantic tasks
http://knowledgeforge.org/, Roberto Navigli
65. Conclusion
• We are less than half-way for implementing the original Semantic Web scenario
• A significant step ahead is introducing semantic interoperability at pattern level
• This requires the hybridisation of knowledge extraction methods as well as the
reconciliation of patterns having different provenance (data mining, graph
mining, ontology patterns, etc.)
• Knowledge Patterns are key element for enabling such hybridisation
• Knowledge Patterns should be organised as a distributed linked system where
links are relations enabling smart reasoning
• A distributed KP system is a resource evolving by a feeding cycle, which
includes human computation
56
66. Special thanks to:
Aldo Gangemi, Malvina Nissim, Misael Mongiovì, Claudia d’Amato for their help
and inspiring discussions.