3. Semantic Computing for coping with the
long tail of data variety
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
more
knowledge
Full data coverage
Full automation
Full knowledge
6. Robust Semantic Model
Semantic intelligent behavior is highly dependent on
knowledge scale (commonsense, semantic)
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
6
7. Robust Semantic Model
Not scalable!
1st Hard problem: Acquisition
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
7
8. Robust Semantic Model
Not scalable!
2nd Hard problem: Consistency
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
8
9. Robust Semantic Model
Not scalable!
3rd Hard problem: Performance
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
9
10. “Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.”
“If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.”
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
10
11. Distributional Semantic Models
Semantic Model with low acquisition effort
(automatically built from text)
Simplification of the representation
Enables the construction of comprehensive
commonsense/semantic KBs
What is the cost?
Some level of noise
(semantic best-effort)
Limited semantic model
11
12. Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
“He filled the wampimuk with the substance, passed it
around and we all drunk some”
12 McDonald & Ramscar, 2001Baroni & Boleda, 2010Harris, 1954
13. Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
contexts = nouns and verbs in the same
sentence
13
14. Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
bark
dog
park
leash
contexts = nouns and verbs in the same
sentence
bark : 2
park : 1
leash : 1
owner : 1
14
19. Shift in the Database Landscape
Very-large and dynamic “schemas”.
10s-100s attributes
1,000s-1,000,000s attributes
before 2000
circa 2015
19 Brodie & Liu, 2010
20. Databases for a Complex World
How do you query data on this scenario?
20
22. Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Semantic Gap Schema-agnostic
query mechanisms
Abstraction level differences
Lexical variation
Structural (compositional) differences
22
23. Proposed Approach
Who is the daughter of Bill Clinton married to?
Abstraction level differences
Lexical variation
Structural (compositional) differences
23
32. Comparative Analysis
Better recall and query coverage compared to baselines with
equivalent precision.
More comprehensive semantic matching.
32
33. Distributional Semantics vs WordNet
Distributional semantics provides a more comprehensive
semantic matching
33
A Distributional Approach for Terminological Semantic Search on the Linked Data
Web, ACM SAC, 2012
34. Large-scale Querying
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
Schema-agnostic querying
37. Relation/Graph Extraction
Now that we are schema-agnostic ...
From Text to Knowledge Graph
Relations + Context + Entity Linking
Ontology-agnostic
RDF serialization
39. Relation/Graph Extraction
General Electric Company, or GE , is an American multinational conglomerate
corporation incorporated in Schenectady , New York
A Semantic Best-Effort Approach for Extracting Structured
Discourse Graphs from Wikipedia, WoLE 2012
42. Commonsense Reasoning
Coping with KB incompleteness
- Supporting semantic approximation
Selective (focussed) reasoning
- Selecting the relevant facts in the context of the inference
Acquisition
Scalability
Strategy: Using distributional semantics to solve both the acquisition
and scalability problems
42
44. Commonsense Reasoning
44
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
Does John Smith have a degree?
Commonsense
KB
45. Selective Reasoning
45
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
Does John Smith have a degree?
Commonsense
KB
Selective reasoning
46. Commonsense Reasoning
46
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
educationhave or
involve
Does John Smith have a degree?
Commonsense
KB
47. Commonsense Reasoning
47
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at location
Does John Smith have a degree?
Commonsense
KB
48. Coping with Incompleteness
48
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at locationcollege
Does John Smith have a degree?
Commonsense
KB
Coping with KB
Incompleteness
49. Commonsense Reasoning
Does John Smith have a degree?
49
John Smith EngineerInstance-level
occupation
Engineer learn
subjectof
memorization
is a
education
have or
involve
university at locationcollege
degreegives
Commonsense
KB
A Distributional Semantics Approach for Selective Reasoning on
Commonsense Graph Knowledge Bases, NLDB 2014.
50. Programming in a Schema-agnostic World
50
Towards An Approximative Ontology-Agnostic Approach for Logic
Programs, FOIKS 2014.
Semantics at Scale: When Distributional Semantics meets Logic
Programming, ALP Newsletter, 2014
51. Programming in a Schema-agnostic World
frequency of use
# of entities and attributes
relational NoSQL
schema-less
unstructured
Schema-agnostic programs
53. Existing semantic technologies can address today major data
management problems
Muiti-disciplinarity is one key:
- NLP + IR + Semantic Web + Databases
Schema-agnosticism is a central property/functionality/goal!
Distributional Semantics + semantics of structured data =
schema-agnosticism
Schema-agnosticism brings major impact for information systems.
We can tame the long tail of data variety!
The wave is just starting. Be a part of it!
Take-away Message
53
54. Want to play with Distributional
Semantics?
http://easy-esa.org
54