Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Semantic Tag Recommendation
1. Exploiting Ontological Relations for
Automatic Semantic Tag Recommendation
Panos Alexopoulos, John Pavlopoulos, Manolis Wallace,
Konstantinos Kafentzis
7th International Conference on Semantic Systems
September 7th, Graz, Austria
1 l June 5, 2012
2. Presentation Outline
Background, Focus & Approach
Tag Recommendation Framework
Example & Experimental Evaluation
Conclusions & Future Work
2 l June 5, 2012
3. Background
Tagging is a textual annotation technique that involves assigning to a
document terms and phrases that are representative of its semantic
content.
The term “representative” may have a different interpretation
depending on the reason why tagging is employed:
Classifying a document by means of concepts that represent meaningful
categories for the document user.
Summarizing a document's content by means of keywords that constitute
a representative description of what the document «talks specifically
about»
Characterizing a document by means of proper adjectives that denote
some kind of judgment(e.g. “positive“, “negative“)
In all cases, the automatic generation of tags remains an open issue.
3 l June 5, 2012
4. Focus
We are interested in tagging not so much as a classification process
but rather as a summarization one.
This means, for example, that we do not wish to determine whether a
given document is about sports or politics but which specific sport
events or politicians it is about.
The challenge in this kind of identification is to be able to distinguish
between the keywords that play a central role to the document's
meaning and those that are just complementary to it.
For example, a piece of news might make reference to many politicians
even when its primary subject is only one of them.
4 l June 5, 2012
5. Approach & Assumptions
We follow a “detective-like” approach and we try to find evidence
within the document that point towards the correct tag(s).
We assume that we have available for the documents we wish to tag
some comprehensive ontology that describes their domain.
Since we wish to tag the documents with specific entities (e.g. films,
director, actors etc) we consider as candidate tags the instances of the
ontology's concepts.
We then consider two fundamental premises:
2. That an instance is more likely to represent the text‘s meaning when
there are many other ontologically related to it instances in the text.
3. That not all relations in a domain ontology are equally important in the
above process.
5 l June 5, 2012
6. Approach & Assumptions
For example, in a historical document, a given event is a likely tag
when the text also contains persons, locations and other events
related to this event.
Similarly, in a document about cinema, a given film is a likely tag when
the text also contains persons involved in the film (directors, actors,
characters etc.).
However, the relation linking films and films characters is more
important than the relation linking films and directors when it comes
to determining whether the text actually refers to a given film.
The reason is that the reference within a specific film text of a film
character that wasn't part of it, is less likely than the reference of a
director.
6 l June 5, 2012
7. Proposed Tagging Framework
Two components:
A tagging context model that models the relative importance of
ontological relations to the tag identification process.
A tag recommendation process that determines, for a given text,
the ontology instances that are potential tags for it and
recommends to the user the ones with the highest confidence
score.
7 l June 5, 2012
8. Tagging Context
Based on the first assumption, we say than an instance A is supported by
another (related to it) instance B when the existence of B in a text is an
indication that A is a potential tag.
The Tagging Context defines for each ontology concept which relations
and to what extent should be used for determining the level of support
that the concept’s instances “enjoy” from other instances in a given text.
For example, for the concept Film a potential tagging context would be:
hasDirector: 0.7
hasActor: 0.5
hasCharacter: 0.9
That would mean that films are generally supported firstly by characters,
then by directors and lastly by actors that are related to them.
8 l June 5, 2012
9. Tag Recommendation Process
1. We define the Tagging Context Model for the given scenario
• We select the ontology concepts whose instances are going to be used as
candidate tags (tag concepts) in the given scenario.
• For each of these concepts we determine the ontological relations that
have this class within their domain and we define their relative
importance in the tagging process.
• We extract from the text the terms that match to instances of the tag
concepts as well those that match to instances of the concepts that
fall within the range of the tag ontological relations.
• E.g. in the film example we would extract films, directors, actors and
characters.
9 l June 5, 2012
10. Tag Recommendation Process
• Using the ontological relations of the context we derive from the
above set of terms those that are related to instances of the tag
concepts. The derived terms comprise the candidate tags for the text
• E.g. in the film example, candidate tags would be the extracted films as
well as those that are related to the extracted actors, directors and
characters.
• For each candidate tag we compute the set of text-found instances
that support the candidacy of the tag to a degree derived from the
tagging context.
• E.g. for the candidate tag “Annie Hall” a possible set could be:
• Woody Allen:0.7
• Annie Hall: 1
• Alvy Singer: 0.9
10 l June 5, 2012
11. Tag Recommendation Process
1. For each of the candidate tag we calculate three scores:
• Its Ontological Support, namely the degree to which the instances
found within the text imply that the candidate tag actually characterizes
the text.
• Its Ontological Ambiguity, namely the degree to which the ontology
entities that are found within the text and support the tag, support
other tags as well.
• Its Tag Confidence, namely the relative confidence that the tag actually
characterizes the text.
11 l June 5, 2012
12. Ontological Support
Intuitively, the ontological support of a candidate tag is analogous to
the number and support degrees of the tag's supporting text instances.
Thus we calculate the overall support score for a given candidate tag
as the sum of the its partial supports, weighted by the relative number
of the instances that support it.
This weighting is important as our aim is to compute for the tag a
support that is relative to the supports of the other tags.
12 l June 5, 2012
13. Ontological Ambiguity
It is the degree to which the ontology entities that are found within the
text and support the tag, support other tags as well.
Intuitively, for a given tag, the more additional tags its support set also
supports, the higher is the tag’s ambiguity.
The ambiguity between two tags is calculated as follows:
First we find the set of text extracted instances that commonly support
the two tags.
Then we calculate the contribution degree of the common set to each of
the wholes based on the following intuition:
If the two contributions are high and comparable then the ambiguity
is high.
If they are low and comparable then the ambiguity is low
If they are non-comparable (i.e. one relatively high and one relatively
low) then again the ambiguity is low.
13 l June 5, 2012
15. Tag Confidence
To derive an overall score about a given tag‘s likelihood that it actually
characterizes the text we use its ambiguity score in order to “adjust” its initial
ontological support.
The intuition here is that the more ambiguous is the tag, the less “reliable” is
its support. Thus, the overall tag confidence is calculated as follows:
Tuning variable w is a weight that adjusts the influence of the ambiguity score
to the total confidence.
The above is not an absolute measure of tag confidence but rather a relative
one that enables the ranking of the candidate tags.
15 l June 5, 2012
16. Example
Tagging a film review about the film “Steel”
“How's this for diminishing returns? In BATMAN AND ROBIN, George Clooney
battled Arnold Schwarzenegger. In SPAWN, it was Michael Jai White versus John
Leguizamo. In STEEL, the third and presumably final superhero stretch of the
summer, Shaquille O'Neal dons a high-tech, hand-crafted suit of armor to combat
the earth-shaking, world-shattering, super-duper-ultra evil menace of... Judd
Nelson? ...”
16 l June 5, 2012
17. Experimental Evaluation
Dataset of 1000 film reviews randomly selected from imdb.
Film ontology derived from Freebase
• ~148000 films, ~36000 directors, ~145000 actors, ~63000 characters
For each review we generated two set of recommended tags
One using a basic key phrase extraction methodology in which we
assumed that the most frequent film found within the text is the one the
text is talking about.
One using our proposed method using the tagging context of the
example.
We then measured the success of each method by measuring the
number of cases in which the tag with the highest confidence was the
correct one.
Our method achieved a score of 89% while the basic method only 45%.
17 l June 5, 2012
18. Summary
We proposed a novel method for automatically generating and
recommending semantic tags for text documents in an effort to
summarize the intended meaning of their content.
Our approach has been based on the customized utilization of
domain-specific ontological relations for extracting and
evaluating “evidence“ from within the text that may identify the
correct tag(s) in the given tagging scenario.
A comprehensive experimental evaluation of the method
highlighted its high effectiveness for the tag recommendation
task.
18 l June 5, 2012
19. Future Work
Establishment of the effectiveness of our method in more complex
and ambiguous domains and with larger datasets.
Extension/enhancement of the method in order to:
Cope with knowledge incompleteness and/or inconsistency.
Determine the values of the tagging context in an automated way.
Determine the exact number of appropriate tags for the document.
19 l June 5, 2012
20. Thank you…
Panos Alexopoulos
Semantic Solutions Architect - Researcher
IMC Technologies SA
2 Marathonos Str. & 360A Kifissias Av.
15233, Athens, Greece
T: +30 210 6927 378
F: +30 210 6926 813
E: palexopoulos@imc.com.gr
W: www.imc.com.gr
20 l June 5, 2012