UAB 2011- Combining human and computational intelligence

Combining Human and
Computational Intelligence
Ilya Zaihrayeu, Pierre Andrews,
Juan Pane

Semantic annotation lifecycle
Problem 4: semi-
automatic semantification
free text annotations
of existing annotations

Problem 2:
extract Problem 1: help the
(semantic) user find and
annotations understand the
from contexts meaning of semantic
of user annotations
resource at What if the users could use
publishing semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning

Context
Problem 3: QoS of semantics-enabled
services
4/14/2011 2

Index: meaning summarization

Problem 1: help the
user find and
understand the
meaning of semantic
annotations

User

Semantic
search … Reasoning

4/14/2011 3

Meaning summarization: why?
• The right meaning of the words being used for
the annotation are in the mind of the people
using them
• E.g.: Java:
– an island in Indonesia south of Borneo; one of the
island
world's most densely populated regions
– a beverage consisting of an infusion of ground coffee
beverage
beans; "he ordered a cup of coffee“
– a simple platform-independent object-oriented
programming language used for writing applets that
programming language
are downloaded from the World Wide Web by a client
and run on the client's machine
• Descriptions are too long for the user to grasp the
meaning immediately – too high barrier to start
generating semantic annotations

4/14/2011 4

Meaning summarization: an
example

One word summaries are
generated from the relations
in the knowledge base, sense
definitions, synonyms and
hypernym terms

4/14/2011 5

Meaning summarization:
evaluation results

Best precision: 63%

If we talk about java, does the word coffee mean the same as island?

Discriminating power: 76,4%

4/14/2011 6

Index: gold standard dataset Problem 4: semi-

In order to evaluate the
performance of the
algorithms, a
gold standard dataset is
needed

User

Semantic

services?
4/14/2011 7

Proposed Approach
Create a gold standard of folksonomy with sense

Tag Tokens Senses
# of annotations 4 296
Unique tags 857
Unique URLs 644
Preprocessing Disambiguation
Unique users 1 194
Annotator Agreement
80% Accuracy 81 %
59% Accuracy
Java – an island in
Indonesia to the south of
javaisland Java island Borneo
Java is land Island – a land mass that is
… surrounded by water
4/14/2011 8

A Platform for Gold Standards of
Semantic Annotation Systems
• Manual validation
• RDF export
• Evaluation of
– Preprocessing
– WSD
– BoW Search
– Convergence
• Open source: 7 modules
25K lines of code
http://sourceforge.net/projects/tags2con/ 26% of comments

4/14/2011 9

Delicious RDF Dataset @ LOD cloud

# triples 85 908
Outlinks to LOD cloud 651 Dereferenceable at:
(WN synsets) http://disi.unitn.it/~knowdive/dataset/delicious/
4/14/2011 10

Index: QoS for semantic search

User

Semantic

services?
4/14/2011 11

Semantic search: why?
• With the free text search, the following problems
may reduce precision and recall:
– synonymy problem: searching for “images” should
return resources annotated with “picture”
– polysemy problem: searching for “java” (island)
should not return resources annotated with “java”
(coffee beverage)
– specificity gap problem: searching for “animals”
should also return resources annotated with “dogs”
• Semantic, meaning-based search can address the
above listed problems

4/14/2011 12

Semantics vs Folksonomy
Used to build
javaisland “raw” queries Semantic search:
complete and
correct results
Used to build (the baseline)
java island BoW queries

Used to build
Java(island) island(land) semantic queries
correct and complete
Specificity Gap (SG)
link
query vehicle

submit SG=1 Recall goes
down as the
specificity gap
car increases
User
SG=2
result
resource taxi
annotation
Specificity Gap
4/14/2011 13

Index: semantic convergence
Problem 4: semi-

User

Semantic

4/14/2011 14

Semantic convergence: Why?
Cannot
Other decide Other Cannot
1% 6% 3% decide
5% Abbreviation
Abbreviation
2%
5%

Missing
sense
15%
With a WN
sense Missing I don't know
49% sense With a WN 4%
Ajax sense
36%
Mac 71%
Apple
CSS
…

Random:
programming and “General” domains: cooking, travel,
web domain I don't
know education
4/14/2011 3% 15

Semantic convergence: proposed
solution
• Find new senses of terms
– Find different senses of the same term (word sense)
– Find synonymous of a term (synonymous sets - synset)
• Place the new synset in the vocabulary is-a hierarchy
• What we improve
– Better use of Machine Learning techniques
– The polysemy issue is not considered in the state of the art
– Missing or “subjective” evaluations in the state of the art
• Evaluation using the Delicious dataset

4/14/2011 16

Convergence Evaluation:
Finding Senses
Tag Collocation User Collocation
t2
t2 B2 U1 B1
B1
t1 t1 t3
t3 t4 t5
B4 U2 t5
B4 t4
B3
B3
Random Baseline
Precision: 56% Precision: 42% Precision: 57%
Recall: 73% Recall: 29% Recall: 68%

4/14/2011 17

Semantic annotation lifecycle
Problem 4: semi-
free text annotations

Problem 2:
extract combining human and computational
Problem 1: help the
(semantic)
user understand the intelligence
annotations
meaning of semantic
from contexts
annotations?
of user
resource at
Conclusions What if the users could use
publishing? semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning

Context
services?
4/14/2011 18

Conclusions
• We developed and evaluated a meaning summarization algorithm
• We developed a “semantic folksonomy” evaluation platform
• We studied the effect of semantics on social tagging systems:
– how much semantics can help?
– how much the user needs to be involved?
– How human and computer intelligence can be combined in the
generation and consumption of semantic annotations
• We developed and evaluated a knowledge base enrichment
algorithm
• We built and used a gold standard dataset for evaluating:
– Word Sense Disambiguation
– Tag Preprocessing
– Semantic Search
– Semantic Convergence

4/14/2011 19

Integration with the use cases
4/14/2011 20

Publications
• Semantic Disambiguation in Folksonomy: a Case Study
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
Advanced Language Technologies for Digital Libraries, Springer’s
LNCS.
• Semantic Annotation of Images on Flickr
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
ESWC 2011
• A Classification of Semantic Annotation Systems
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
Semantic Web Journal – second review phase

• Sense Induction in Folksonomies
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
IJCAI-LHD 2011 – under review
• Evaluating the Quality of Service in Semantic Annotation Systems
Ilya Zaihrayeu, Pierre Andrews, and Juan Pane;
in preparation
4/14/2011 21

WP 2 TIMELINE AND DELIVERABLES
Months
0 6 12 18 24 30 36
D2.1.1: State of the Art
Tasks D2.1.2: Specification of the
and requirements from
model
the use case partners
Task 2.1
Designing UIBK
models
D2.2.2+D2.2.3: Report on linking
D2.4 Report on the
D2.2.1: Report on bootstrapping semantic annotations to external sources
refinement of the proposed
semantic annotations and on reaching and on keeping them up-to-date when
models, methods and
consensus in the use of semantics the underlying semantic model changes
semantic search
Task 2.2
Designing
methods UNITN
Task 2.3 D2.3.1: Requirements for D2.3.2: Specification for
Research on semantics-aware IR methods semantics-aware IR methods
Information
Retrieval (IR)
methods for ONTO D2.5 Report on the state of
semantic the art, proposed suitable
models and methods for
content automatic visual annotation

Task 2.4
Models and
methods for UTC
automatic
visual
annotation

UAB 2011- Combining human and computational intelligence

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (8)

Similaire à UAB 2011- Combining human and computational intelligence

Similaire à UAB 2011- Combining human and computational intelligence (20)

Plus de INSEMTIVES project

Plus de INSEMTIVES project (17)

Dernier

Dernier (20)

UAB 2011- Combining human and computational intelligence

Notes de l'éditeur