My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

http://lora-aroyo.org @laroyo
Disrupting the Semantic
Lora Aroyo
Web & Media Group

Web & Media Group
Bulgaria
The Netherlands
Sofia
NYC
Personal
Semantics

Web & Media Group
Riva del Garda, Italy, 2014
Semantic
Social Life

Web & Media Group
4
To understand the value of
Semantic Web for e-learning
you have to understand people,
e.g. how they learn, interact &
consume information

Web & Media Group
5
To understand the value of
Semantic Web for e-learning
you have to understand people,
e.g. how they interact &
consume information

Web & Media Group
6
To understand the value of Semantic Web
for cultural heritage
you have to understand people, e.g.
how they interact & consume information

Web & Media Group
7
for cultural heritage
you have to understand people, e.g.

Web & Media Group
for digital humanities, you have to
understand people, e.g.

Web & Media Group
people are in the center of everything
people & their semantics, i.e. their real-world behavior,
online interactions, information needs, information
consumption habits, personal preferences ...

Web & Media Group
CrowdTruth team

Web & Media Group
the evolution of the semantic web:
great moments from the 1980s to ESWC 2017

50’AI more or less begins
......
80’expert systems
90’knowledge acquisition from experts
00’standards & interoperability
10’big data & large crowds
A long time ago
in a galaxy far, far away …

80’s - empire of the experts

Advances in hardware and SDEs
PCs, workstations, Symbolics, Sun
New architectures like the Hypercube
LISP, Prolog, OPS
AI can now BUILD SYSTEMS
Primary focus on experts and rules
What is the knowledge of experts
What is the form of this knowledge?
Graphs, logic, rules, frames
How do experts reason?
Deduction, induction
80’s - empire of the experts
Work on form & process remained
academic
what happened inside the system, to
make the reasoning inside the system
proper and as good as possible
industry forged ahead with ad-hoc
& proprietary systems and actually
tried to build expert systems
Originals of uncertain KR
Fuzzy, probabilistic

Piero Bonissone and the
DELTA/CATS expert system for
locomotive repair with David Smith, a
locomotive repair expert
Buchanan and Shortliff’s MYCIN project at
Stanford built an huge rule base for medicat
diagnosis working with an extensive team of
medical experts.

90’s - knowledge acquisition from experts

90’s - knowledge acquisition from experts
The 90’s brought [attention for] knowledge acquisition.
Knowing that expert systems by then can functionally work, the focus [in
practice as well as scientific research and technology development] shifted
to the then-bigger challenge of how to acquire knowledge in real-world
scenarios.
It seems natural that after the look inside the systems, then one needed
to pay attention to how actually get the knowledge from the world outside
and frame it into the proper structured knowledge for inside the system.
Dream of the 90’s

00’s - interoperability & standards odyssey

10’s - AI Awakens
• Machine Learning
• Neural networks
• Solving basic perceptual problems instead of high-expertise ones
• Ambiguity tolerant reasoning
• Non-taxonomic ordering → non-taxonomic reasoning
• folksonomies, clustering, diversity of perspectives, embeddings

Web & Media Group
2011

10’s – Big Data

Web & Media Group
Human Annotation
Central in Machine Learning
Training & Evaluation
10’s – Crowds

Web & Media Group
Team BellKor wins Netflix Prize
20071998 2006 2009

Web & Media Group

Web & Media Group
the semantic
comfort
zone

Web & Media Group
One truth: knowledge acquisition for the semantic web
assumes one correct interpretation for every example
All examples are created equal: triples are triples, one is not
more important than another, they are all either true or false
Disagreement bad: when people disagree, they don’t
understand the problem
Experts rule: knowledge is captured from domain experts
One is enough: knowledge by a single expert is sufficient
Detailed explanations help: if examples cause disagreement
- add instructions
Once done, forever valid: knowledge is not updated; new
data not aligned with old
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Web & Media Group
Use Case:
video archive
enrichment
Search Behavior of Media Professionals at an Audiovisual Archive:
A Transaction Log Analysis (2009).
B. Huurnink, L. Hollink, W. van den Heuvel, M. de Rijke.

Web & Media Group
Use Case:
video archive
enrichment
Goal:
make the
multimedia content of
Dutch National Video Archive
accessible to large audiences
Comfort Zone Solution:
media professionals watch & annotate videos. Of course!

Web & Media Group
but ...
Expensive
Doesn’t scale
time-consuming
5 times the video duration
professional vocabulary
experts use a specific vocabulary
that is unknown to general audiences

Web & Media Group
… and
people search for fragments
experts annotate full videos
not finding
35% of search queries result in not found

Web & Media Group
Use Case:
real world QA
for Watson
Crowdsourcing ground truth for Question Answering using CrowdTruth (2015).
B Timmermans, L Aroyo, C Welty

Web & Media Group
Goal:
gather questions
that real people ask
for training & evaluating Watson
Data:
30K Questions + Candidate Answers.
from Yahoo! Answers
ask people if the passage answers the question (Y/N). Simple!
Use Case:
real world QA
for Watson

Web & Media Group
Contradicting evidence
Is Coral a plant?
• “Coral almost could be considered half-plant [..]”
• “[..] organism, such as a coral, resembling a stony plant.”
Unanswerable questions
• Can I take a pill if you don't have a child yet?
• Is the spelling for being drunk right?
• Is napster black?
Unclear answer type
Is paper animal plant or man made?
Multiple right answers to a question
What is the best university in NY? (subjective)
YES or NO?

Web & Media Group
Use Case:
medical relation
extraction
for Watson
Crowdsourcing Ground Truth for Medical Relation Extraction (2017).
A Dumitrache, L Aroyo, C Welty

Web & Media Group
Goal:
gather data to train
Watson to read
medical text & automatically
extract a medical relations KB
having medical experts read & annotate examples
Use Case:
medical relation
extraction
for Watson

Web & Media Group
ANTIBIOTICS are the first line treatment for
indications of TYPHUS.
treats(ANTIBIOTICS, TYPHUS)? Expert: yes
Patients with TYPHUS who were given ANTIBIOTICS
exhibited side-effects.
treats(ANTIBIOTICS, TYPHUS)? Expert: yes
With ANTIBIOTICS in short supply, DDT was used
during WWII to control the insect vectors of
TYPHUS.
treats(ANTIBIOTICS, TYPHUS)? Expert: yes.
Are these three really all the same???

Web & Media Group
Use Case:
map music to moods

Web & Media Group
Use Case:
map music to moods
Goal:
annotate songs with emotional tags
people assign the prevalent mood of a song

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Other
passionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into
rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5
confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clusters
boisterous, good-natured autumnal, wry visceral
rowdy brooding
Choose one:
Which is the mood most appropriate
for each song?
Goal:
(Lee and Hu 2012)
1 song - 1 mood???

Web & Media Group
- add instructions
Semantic
Comfort Zone

Web & Media Group
- add instructions
Semantic
Comfort Zone
disrupted

Web & Media Group
interestingly …

Web & Media Group
• collective decisions of large groups
of people
• a group of error-prone
decision-makers can be surprisingly
good at picking the best choice
• when thumbs up or thumbs down - the
chance of picking the right answer
needs to be > 50%
• the odds that a most of them will pick
the right answer is greater than any of
them will pick it on their own
• performance gets better as size grows
1785
Marquis de Condorcet
“wisdom of crowds”

Web & Media Group
•asked 787 people to
guess the weight of
an ox
•none got the right
answer
•their collective guess
was almost perfect
1906
Sir Francis Galton
“wisdom of crowds”

Web & Media Group
WWII Math Rosies
1942: Ballistics calculations and flight trajectories

Web & Media Group
NASA’s Computer Room
transcribe raw flight data from celluloid film & oscillograph paper

Web & Media Group
can we harness it?

Web & Media Group
CrowdTruth
http://crowdtruth.org/

Web & Media Group
CrowdTruth
Three basic causes of disagreement: workers,
examples, target semantics
Disagreement is signal, not noise.
It is indicative of the variation in human semantic
interpretation
It can indicate ambiguity, vagueness, similarity,
over-generality, etc, as well as quality
Crowdtruth: Machine-human computation framework for harnessing disagreement
in gathering annotated data (2014)
O Inel, A Dumitrache, l.Aroyo, C. Welty

Web & Media Group
one truth: multiple truths
all examples are created equal:
each example is unique
disagreement bad: disagreement is good
experts rule: crowd rules
one is enough: the more the better
detailed explanations help:
keep it simple stupid
once done, forever valid:
maintenance is necessary

Web & Media Group
changes needed
video archive
enrichment
improve support
for fragment search
time-based annotations
bridging vocabulary gap between
searcher & cataloguer

Web & Media Group
crowdsourcing
video tagging
two
video tagging pilots

Web & Media Group
@waisda
http://waisda.nl
engage
crowds
through
continuous
gaming

Web & Media Group
“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

Web & Media Group
time-based
bernhard
just “tags”

Web & Media Group
objects (57%)
westminster abbey
abbey
priester
geestelijken
hek
paarden
tocht
aankomst
koets
kroning
mensenmassa
parade
kroon
regen

Web & Media Group
persons (31%)
bernhard
juliana
objects (57%)

Web & Media Group
user vocabulary
8% in professional vocabulary
23% in Dutch lexicon
89% found on Google
locations (7%)
engeland
locations (7%)
persons (31%)
objects (57%)

Web & Media Group
user vocabulary
89% found on Google
locations (7%)
describe mainly short segments
often not very specific
don’t describe programmes as a whole
user vocabulary
89% found on Google

Web & Media Group
crowdsourcing
medical relation
extraction
diversity of opinions
independent perspectives
multitude of contexts
we exposed a richer set of possibilities
that help in identifying, processing
& understanding context

Web & Media Group
Does this sentence express
TREATS(Antibiotics, Typhus)?
Patients with TYPHUS who were given
ANTIBIOTICS exhibited several side-effects.
With ANTIBIOTICS in short supply, DDT was
used during World War II to control the insect
vectors of TYPHUS.
ANTIBIOTICS are the first line treatment for
indications of TYPHUS. 95%
75%
50%
The crowd results captures the natural ambiguity

Web & Media Group
What is the relation between the highlighted terms?
He was the first physician to identify the relationship
between HEMOPHILIA and HEMOPHILIC ARTHROPATHY.
Experts Hallucinate
Crowd reads text literally - provide better examples to machine
experts: cause
crowd: no relation

Web & Media Group
Unclear relationship between the two arguments reflected
in the disagreement
Medical Relation Extraction

Web & Media Group
Clearly expressed relation between the two arguments reflected in
the agreement
Medical Relation Extraction

Web & Media Group

Web & Media Group
Learning Curves
(crowd with pos./neg. threshold at 0.5)
above 400 sent.: crowd consistently over baseline & single
above 600 sent.: crowd out-performs experts

Web & Media Group
Learning Curves Extended
(crowd with pos./neg. threshold at 0.5)
crowd consistently performs better than baseline

Web & Media Group
# of Workers: Impact on Sentence-Relation Score

Web & Media Group
Training a Relation Extraction Classifier
F1
Cost per
sentence
CrowdTruth 0.642 $0.66
Expert Annotator 0.638 $2.00
Single Annotator 0.492 $0.08
“wisdom of the crowd”
provides training data that is at least as good
if not better than experts
only with proper analytic framework for
harnessing disagreement from the crowd

Web & Media Group
map music to moods
Goal:
tag songs with emotional clusters
people assign the prevalent mood of a song

Web & Media Group
Is this song ….
?Passionate
Rousing
Confident
Boisterous
Rowdy
Literate
Poignant
Wistful
Bittersweet
Autumnal
Brooding
Rollicking
Cheerful
Fun
Sweet
Amiable
Good-natured
Humorous
Silly
Campy
Whimsical
Witty
Wry
Aggressive
Fiery
Tense
Anxious
Intense
Volatile

Web & Media Group
If “One Truth” & “No Disagreement”
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5
W1 1
W2 1
W3 1
W4 1
W5 1
W6 1
W7
W8
W9 1
W10 1
Totals 1 3 1 2 1

Web & Media Group
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other
W1 1 1 1
W2 1 1 1
W3 1 1 1
W4 1 1
W5 1 1
W6 1 1 1
W7 1 1 1
W8 1 1 1
W9 1 1
W10 1 1 1 1 1
Totals 3 5 6 5 2 8
If “Many Truths” & “Disagreement”

Web & Media Group
can indicate
alternative interpretations
Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other
W10 1 1 1 1 1
Totals 3 5 6 5 2 8
Disagreement as Signal
can indicate
ambiguity in the
categorisation
can indicate
low quality workers

so …

getting
comfortable
again

Take Home Message
People first, experts second
True and False is not enough,
There is diversity in human interpretation
CrowdTruth introduces a spatial representation
of meaning that harnesses disagreement
With CrowdTruth untrained workers can be just as
reliable as highly trained experts

http://data.crowdtruth.org/

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (13)

Similaire à My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone

Similaire à My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone (20)

Plus de Lora Aroyo

Plus de Lora Aroyo (20)

Dernier

Dernier (20)

My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone