2. Semantic annotation lifecycle
Problem 4: semi-
automatic semantification
free text annotations
of existing annotations
Problem 2:
extract Problem 1: help the
(semantic) user find and
annotations understand the
from contexts meaning of semantic
of user annotations
resource at What if the users could use
publishing semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning
Context
Problem 3: QoS of semantics-enabled
services
4/14/2011 2
3. Index: meaning summarization
Problem 1: help the
user find and
understand the
meaning of semantic
annotations
User
Semantic
search … Reasoning
4/14/2011 3
4. Meaning summarization: why?
• The right meaning of the words being used for
the annotation are in the mind of the people
using them
• E.g.: Java:
– an island in Indonesia south of Borneo; one of the
island
world's most densely populated regions
– a beverage consisting of an infusion of ground coffee
beverage
beans; "he ordered a cup of coffee“
– a simple platform-independent object-oriented
programming language used for writing applets that
programming language
are downloaded from the World Wide Web by a client
and run on the client's machine
• Descriptions are too long for the user to grasp the
meaning immediately – too high barrier to start
generating semantic annotations
4/14/2011 4
5. Meaning summarization: an
example
One word summaries are
generated from the relations
in the knowledge base, sense
definitions, synonyms and
hypernym terms
4/14/2011 5
6. Meaning summarization:
evaluation results
Best precision: 63%
If we talk about java, does the word coffee mean the same as island?
Discriminating power: 76,4%
4/14/2011 6
7. Index: gold standard dataset Problem 4: semi-
automatic semantification
of existing annotations
In order to evaluate the
performance of the
algorithms, a
gold standard dataset is
needed
User
Semantic
search … Reasoning
Problem 3: QoS of semantics-enabled
services?
4/14/2011 7
8. Proposed Approach
Create a gold standard of folksonomy with sense
Tag Tokens Senses
# of annotations 4 296
Unique tags 857
Unique URLs 644
Preprocessing Disambiguation
Unique users 1 194
Annotator Agreement
80% Accuracy 81 %
59% Accuracy
Java – an island in
Indonesia to the south of
javaisland Java island Borneo
Java is land Island – a land mass that is
… surrounded by water
4/14/2011 8
9. A Platform for Gold Standards of
Semantic Annotation Systems
• Manual validation
• RDF export
• Evaluation of
– Preprocessing
– WSD
– BoW Search
– Convergence
• Open source: 7 modules
25K lines of code
http://sourceforge.net/projects/tags2con/ 26% of comments
4/14/2011 9
11. Index: QoS for semantic search
User
Semantic
search … Reasoning
Problem 3: QoS of semantics-enabled
services?
4/14/2011 11
12. Semantic search: why?
• With the free text search, the following problems
may reduce precision and recall:
– synonymy problem: searching for “images” should
return resources annotated with “picture”
– polysemy problem: searching for “java” (island)
should not return resources annotated with “java”
(coffee beverage)
– specificity gap problem: searching for “animals”
should also return resources annotated with “dogs”
• Semantic, meaning-based search can address the
above listed problems
4/14/2011 12
13. Semantics vs Folksonomy
Used to build
javaisland “raw” queries Semantic search:
complete and
correct results
Used to build (the baseline)
java island BoW queries
Used to build
Java(island) island(land) semantic queries
correct and complete
Specificity Gap (SG)
link
query vehicle
submit SG=1 Recall goes
down as the
specificity gap
car increases
User
SG=2
result
resource taxi
annotation
Specificity Gap
4/14/2011 13
14. Index: semantic convergence
Problem 4: semi-
automatic semantification
of existing annotations
User
Semantic
search … Reasoning
4/14/2011 14
15. Semantic convergence: Why?
Cannot
Other decide Other Cannot
1% 6% 3% decide
5% Abbreviation
Abbreviation
2%
5%
Missing
sense
15%
With a WN
sense Missing I don't know
49% sense With a WN 4%
Ajax sense
36%
Mac 71%
Apple
CSS
…
Random:
programming and “General” domains: cooking, travel,
web domain I don't
know education
4/14/2011 3% 15
16. Semantic convergence: proposed
solution
• Find new senses of terms
– Find different senses of the same term (word sense)
– Find synonymous of a term (synonymous sets - synset)
• Place the new synset in the vocabulary is-a hierarchy
• What we improve
– Better use of Machine Learning techniques
– The polysemy issue is not considered in the state of the art
– Missing or “subjective” evaluations in the state of the art
• Evaluation using the Delicious dataset
4/14/2011 16
17. Convergence Evaluation:
Finding Senses
Tag Collocation User Collocation
t2
t2 B2 U1 B1
B1
t1 t1 t3
t3 t4 t5
B4 U2 t5
B4 t4
B3
B3
Random Baseline
Precision: 56% Precision: 42% Precision: 57%
Recall: 73% Recall: 29% Recall: 68%
4/14/2011 17
18. Semantic annotation lifecycle
Problem 4: semi-
automatic semantification
free text annotations
of existing annotations
Problem 2:
extract combining human and computational
Problem 1: help the
(semantic)
user understand the intelligence
annotations
meaning of semantic
from contexts
annotations?
of user
resource at
Conclusions What if the users could use
publishing? semantic annotations
instead to leverage semantic
technology services?
User Semantic
Semantic annotation=structure
search … Reasoning and/or meaning
Context
Problem 3: QoS of semantics-enabled
services?
4/14/2011 18
19. Conclusions
• We developed and evaluated a meaning summarization algorithm
• We developed a “semantic folksonomy” evaluation platform
• We studied the effect of semantics on social tagging systems:
– how much semantics can help?
– how much the user needs to be involved?
– How human and computer intelligence can be combined in the
generation and consumption of semantic annotations
• We developed and evaluated a knowledge base enrichment
algorithm
• We built and used a gold standard dataset for evaluating:
– Word Sense Disambiguation
– Tag Preprocessing
– Semantic Search
– Semantic Convergence
4/14/2011 19
21. Publications
• Semantic Disambiguation in Folksonomy: a Case Study
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
Advanced Language Technologies for Digital Libraries, Springer’s
LNCS.
• Semantic Annotation of Images on Flickr
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
ESWC 2011
• A Classification of Semantic Annotation Systems
Pierre Andrews, Sergey Kanshin, Juan Pane, and Ilya Zaihrayeu;
Semantic Web Journal – second review phase
• Sense Induction in Folksonomies
Pierre Andrews, Juan Pane, and Ilya Zaihrayeu;
IJCAI-LHD 2011 – under review
• Evaluating the Quality of Service in Semantic Annotation Systems
Ilya Zaihrayeu, Pierre Andrews, and Juan Pane;
in preparation
4/14/2011 21
22. WP 2 TIMELINE AND DELIVERABLES
Months
0 6 12 18 24 30 36
D2.1.1: State of the Art
Tasks D2.1.2: Specification of the
and requirements from
model
the use case partners
Task 2.1
Designing UIBK
models
D2.2.2+D2.2.3: Report on linking
D2.4 Report on the
D2.2.1: Report on bootstrapping semantic annotations to external sources
refinement of the proposed
semantic annotations and on reaching and on keeping them up-to-date when
models, methods and
consensus in the use of semantics the underlying semantic model changes
semantic search
Task 2.2
Designing
methods UNITN
Task 2.3 D2.3.1: Requirements for D2.3.2: Specification for
Research on semantics-aware IR methods semantics-aware IR methods
Information
Retrieval (IR)
methods for ONTO D2.5 Report on the state of
semantic the art, proposed suitable
models and methods for
content automatic visual annotation
Task 2.4
Models and
methods for UTC
automatic
visual
annotation
Notes de l'éditeur
Say how it’s different from tagora dataset => we have gold standard preprocessing disambiguation, with agreement between at least two annotators
The first platform for building gold standards for the evaluation of concept-based search algorithms, vocabulary convergence algorithms, etc in folksonomiesThe first gold standard dataset produced and publishedThe first evaluation of a keywords-based search algorithm w.r.t. the gold standard semantic search in a folksonomyTag preprocessing algorithm, WSD algorithm, concept-based search algorithm