SlideShare une entreprise Scribd logo
1  sur  20
Exploiting Ontological Relations for
           Automatic Semantic Tag Recommendation


             Panos Alexopoulos, John Pavlopoulos, Manolis Wallace,
                            Konstantinos Kafentzis




                     7th International Conference on Semantic Systems
                                September 7th, Graz, Austria
1 l June 5, 2012
Presentation Outline
 Background, Focus & Approach

 Tag Recommendation Framework

 Example          & Experimental Evaluation

 Conclusions & Future Work




2 l June 5, 2012
Background
   Tagging is a textual annotation technique that involves assigning to a
    document terms and phrases that are representative of its semantic
    content.

   The term “representative” may have a different interpretation
    depending on the reason why tagging is employed:

         Classifying a document by means of concepts that represent meaningful
          categories for the document user.
         Summarizing a document's content by means of keywords that constitute
          a representative description of what the document «talks specifically
          about»
         Characterizing a document by means of proper adjectives that denote
          some kind of judgment(e.g. “positive“, “negative“)

   In all cases, the automatic generation of tags remains an open issue.

3 l June 5, 2012
Focus
   We are interested in tagging not so much as a classification process
    but rather as a summarization one.

   This means, for example, that we do not wish to determine whether a
    given document is about sports or politics but which specific sport
    events or politicians it is about.

   The challenge in this kind of identification is to be able to distinguish
    between the keywords that play a central role to the document's
    meaning and those that are just complementary to it.

   For example, a piece of news might make reference to many politicians
    even when its primary subject is only one of them.



4 l June 5, 2012
Approach & Assumptions
   We follow a “detective-like” approach and we try to find evidence
    within the document that point towards the correct tag(s).

   We assume that we have available for the documents we wish to tag
    some comprehensive ontology that describes their domain.

   Since we wish to tag the documents with specific entities (e.g. films,
    director, actors etc) we consider as candidate tags the instances of the
    ontology's concepts.

   We then consider two fundamental premises:

        2. That an instance is more likely to represent the text‘s meaning when
           there are many other ontologically related to it instances in the text.
        3. That not all relations in a domain ontology are equally important in the
           above process.

5 l June 5, 2012
Approach & Assumptions
   For example, in a historical document, a given event is a likely tag
    when the text also contains persons, locations and other events
    related to this event.

   Similarly, in a document about cinema, a given film is a likely tag when
    the text also contains persons involved in the film (directors, actors,
    characters etc.).

   However, the relation linking films and films characters is more
    important than the relation linking films and directors when it comes
    to determining whether the text actually refers to a given film.

   The reason is that the reference within a specific film text of a film
    character that wasn't part of it, is less likely than the reference of a
    director.

6 l June 5, 2012
Proposed Tagging Framework
   Two components:

            A tagging context model that models the relative importance of
             ontological relations to the tag identification process.

            A tag recommendation process that determines, for a given text,
             the ontology instances that are potential tags for it and
             recommends to the user the ones with the highest confidence
             score.




7 l June 5, 2012
Tagging Context
   Based on the first assumption, we say than an instance A is supported by
    another (related to it) instance B when the existence of B in a text is an
    indication that A is a potential tag.

   The Tagging Context defines for each ontology concept which relations
    and to what extent should be used for determining the level of support
    that the concept’s instances “enjoy” from other instances in a given text.

   For example, for the concept Film a potential tagging context would be:
            hasDirector: 0.7
            hasActor: 0.5
            hasCharacter: 0.9

   That would mean that films are generally supported firstly by characters,
    then by directors and lastly by actors that are related to them.


8 l June 5, 2012
Tag Recommendation Process
1.    We define the Tagging Context Model for the given scenario
        •      We select the ontology concepts whose instances are going to be used as
               candidate tags (tag concepts) in the given scenario.

        •      For each of these concepts we determine the ontological relations that
               have this class within their domain and we define their relative
               importance in the tagging process.

•     We extract from the text the terms that match to instances of the tag
      concepts as well those that match to instances of the concepts that
      fall within the range of the tag ontological relations.

        •      E.g. in the film example we would extract films, directors, actors and
               characters.



9 l June 5, 2012
Tag Recommendation Process
•       Using the ontological relations of the context we derive from the
        above set of terms those that are related to instances of the tag
        concepts. The derived terms comprise the candidate tags for the text

        •           E.g. in the film example, candidate tags would be the extracted films as
                    well as those that are related to the extracted actors, directors and
                    characters.

•       For each candidate tag we compute the set of text-found instances
        that support the candidacy of the tag to a degree derived from the
        tagging context.

        •           E.g. for the candidate tag “Annie Hall” a possible set could be:
                    •     Woody Allen:0.7
                    •     Annie Hall: 1
                    •     Alvy Singer: 0.9

10 l June 5, 2012
Tag Recommendation Process
1.      For each of the candidate tag we calculate three scores:

        •           Its Ontological Support, namely the degree to which the instances
                    found within the text imply that the candidate tag actually characterizes
                    the text.

        •           Its Ontological Ambiguity, namely the degree to which the ontology
                    entities that are found within the text and support the tag, support
                    other tags as well.

        •           Its Tag Confidence, namely the relative confidence that the tag actually
                    characterizes the text.




11 l June 5, 2012
Ontological Support
   Intuitively, the ontological support of a candidate tag is analogous to
    the number and support degrees of the tag's supporting text instances.

   Thus we calculate the overall support score for a given candidate tag
    as the sum of the its partial supports, weighted by the relative number
    of the instances that support it.

   This weighting is important as our aim is to compute for the tag a
    support that is relative to the supports of the other tags.




12 l June 5, 2012
Ontological Ambiguity
   It is the degree to which the ontology entities that are found within the
    text and support the tag, support other tags as well.

   Intuitively, for a given tag, the more additional tags its support set also
    supports, the higher is the tag’s ambiguity.

   The ambiguity between two tags is calculated as follows:
         First we find the set of text extracted instances that commonly support
          the two tags.
         Then we calculate the contribution degree of the common set to each of
          the wholes based on the following intuition:
             If the two contributions are high and comparable then the ambiguity
                is high.
             If they are low and comparable then the ambiguity is low
             If they are non-comparable (i.e. one relatively high and one relatively
                low) then again the ambiguity is low.

13 l June 5, 2012
Ontological Ambiguity




14 l June 5, 2012
Tag Confidence
   To derive an overall score about a given tag‘s likelihood that it actually
    characterizes the text we use its ambiguity score in order to “adjust” its initial
    ontological support.

   The intuition here is that the more ambiguous is the tag, the less “reliable” is
    its support. Thus, the overall tag confidence is calculated as follows:




    Tuning variable w is a weight that adjusts the influence of the ambiguity score
     to the total confidence.

    The above is not an absolute measure of tag confidence but rather a relative
     one that enables the ranking of the candidate tags.
15 l June 5, 2012
Example
   Tagging a film review about the film “Steel”

“How's this for diminishing returns? In BATMAN AND ROBIN, George Clooney
battled Arnold Schwarzenegger. In SPAWN, it was Michael Jai White versus John
Leguizamo. In STEEL, the third and presumably final superhero stretch of the
summer, Shaquille O'Neal dons a high-tech, hand-crafted suit of armor to combat
the earth-shaking, world-shattering, super-duper-ultra evil menace of... Judd
Nelson? ...”




16 l June 5, 2012
Experimental Evaluation
   Dataset of 1000 film reviews randomly selected from imdb.
   Film ontology derived from Freebase
        •    ~148000 films, ~36000 directors, ~145000 actors, ~63000 characters

   For each review we generated two set of recommended tags
         One using a basic key phrase extraction methodology in which we
          assumed that the most frequent film found within the text is the one the
          text is talking about.
         One using our proposed method using the tagging context of the
          example.

   We then measured the success of each method by measuring the
    number of cases in which the tag with the highest confidence was the
    correct one.

   Our method achieved a score of 89% while the basic method only 45%.

17 l June 5, 2012
Summary
 We proposed a novel method for automatically generating and
    recommending semantic tags for text documents in an effort to
    summarize the intended meaning of their content.

 Our approach has been based on the customized utilization of
    domain-specific ontological relations for extracting and
    evaluating “evidence“ from within the text that may identify the
    correct tag(s) in the given tagging scenario.

 A comprehensive experimental evaluation of the method
    highlighted its high effectiveness for the tag recommendation
    task.



18 l June 5, 2012
Future Work
 Establishment of the effectiveness of our method in more complex
    and ambiguous domains and with larger datasets.

 Extension/enhancement of the method in order to:

            Cope with knowledge incompleteness and/or inconsistency.

            Determine the values of the tagging context in an automated way.

            Determine the exact number of appropriate tags for the document.




19 l June 5, 2012
Thank you…
                    Panos Alexopoulos
                    Semantic Solutions Architect - Researcher


                    IMC Technologies SA
                    2 Marathonos Str. & 360A Kifissias Av.
                    15233, Athens, Greece
                    T: +30 210 6927 378
                    F: +30 210 6926 813
                    E: palexopoulos@imc.com.gr
                    W: www.imc.com.gr




20 l June 5, 2012

Contenu connexe

Similaire à Semantic Tag Recommendation

Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
Panos Alexopoulos
 
Rhetorical Analysis RhetoricUsed in advertising, p.docx
Rhetorical Analysis RhetoricUsed in advertising, p.docxRhetorical Analysis RhetoricUsed in advertising, p.docx
Rhetorical Analysis RhetoricUsed in advertising, p.docx
SUBHI7
 
Eng 103 how to write compare and contrast essays
Eng 103   how to write compare and contrast essaysEng 103   how to write compare and contrast essays
Eng 103 how to write compare and contrast essays
Nafis Kamal
 
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docx
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docxProject II Annotated Bibliography of Film ReviewsArticles + Shor.docx
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docx
bfingarjcmc
 
Module 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docxModule 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docx
moirarandell
 
How to use this template To use this template, replace the inst.docx
How to use this template To use this template, replace the inst.docxHow to use this template To use this template, replace the inst.docx
How to use this template To use this template, replace the inst.docx
wellesleyterresa
 
13Title of Your EssayYour First and Last Name.docx
13Title of Your EssayYour First and Last Name.docx13Title of Your EssayYour First and Last Name.docx
13Title of Your EssayYour First and Last Name.docx
aulasnilda
 
Acting and Acting StylePrepareAs we have been discussing, .docx
Acting and Acting StylePrepareAs we have been discussing, .docxActing and Acting StylePrepareAs we have been discussing, .docx
Acting and Acting StylePrepareAs we have been discussing, .docx
nettletondevon
 

Similaire à Semantic Tag Recommendation (20)

A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Lac presentation
Lac presentationLac presentation
Lac presentation
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
O01741103108
O01741103108O01741103108
O01741103108
 
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named...
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Senior High School Reading and Writing SKills
Senior High School Reading and Writing SKillsSenior High School Reading and Writing SKills
Senior High School Reading and Writing SKills
 
Rhetorical Analysis RhetoricUsed in advertising, p.docx
Rhetorical Analysis RhetoricUsed in advertising, p.docxRhetorical Analysis RhetoricUsed in advertising, p.docx
Rhetorical Analysis RhetoricUsed in advertising, p.docx
 
Eng 103 how to write compare and contrast essays
Eng 103   how to write compare and contrast essaysEng 103   how to write compare and contrast essays
Eng 103 how to write compare and contrast essays
 
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docx
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docxProject II Annotated Bibliography of Film ReviewsArticles + Shor.docx
Project II Annotated Bibliography of Film ReviewsArticles + Shor.docx
 
Module 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docxModule 7 Discussion Board Algebra1. What does it mean when s.docx
Module 7 Discussion Board Algebra1. What does it mean when s.docx
 
Frame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with SentiloFrame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with Sentilo
 
MODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptxMODULE 4-Text Analytics.pptx
MODULE 4-Text Analytics.pptx
 
How to use this template To use this template, replace the inst.docx
How to use this template To use this template, replace the inst.docxHow to use this template To use this template, replace the inst.docx
How to use this template To use this template, replace the inst.docx
 
Rhetorical Appeals
Rhetorical AppealsRhetorical Appeals
Rhetorical Appeals
 
13Title of Your EssayYour First and Last Name.docx
13Title of Your EssayYour First and Last Name.docx13Title of Your EssayYour First and Last Name.docx
13Title of Your EssayYour First and Last Name.docx
 
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
 
Acting and Acting StylePrepareAs we have been discussing, .docx
Acting and Acting StylePrepareAs we have been discussing, .docxActing and Acting StylePrepareAs we have been discussing, .docx
Acting and Acting StylePrepareAs we have been discussing, .docx
 
Senior High School Reading and Writing SKills
Senior High School Reading and Writing SKillsSenior High School Reading and Writing SKills
Senior High School Reading and Writing SKills
 

Plus de Panos Alexopoulos

A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
Panos Alexopoulos
 

Plus de Panos Alexopoulos (9)

How many truths can you handle?
How many truths can you handle?How many truths can you handle?
How many truths can you handle?
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
 
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the IndustryTroubleshooting and Optimizing Named Entity Resolution Systems in the Industry
Troubleshooting and Optimizing Named Entity Resolution Systems in the Industry
 
Detecting, Measuring and Representing Vagueness in Ontologies
Detecting, Measuring and Representing Vagueness in OntologiesDetecting, Measuring and Representing Vagueness in Ontologies
Detecting, Measuring and Representing Vagueness in Ontologies
 
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in OntologiesA Vague Sense Classifier for Detecting Vague Definitions in Ontologies
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic Data
 
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationTowards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven Summarization
 
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...
 
Vagueness in Semantic Information Management
Vagueness in Semantic Information ManagementVagueness in Semantic Information Management
Vagueness in Semantic Information Management
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Semantic Tag Recommendation

  • 1. Exploiting Ontological Relations for Automatic Semantic Tag Recommendation Panos Alexopoulos, John Pavlopoulos, Manolis Wallace, Konstantinos Kafentzis 7th International Conference on Semantic Systems September 7th, Graz, Austria 1 l June 5, 2012
  • 2. Presentation Outline  Background, Focus & Approach  Tag Recommendation Framework  Example & Experimental Evaluation  Conclusions & Future Work 2 l June 5, 2012
  • 3. Background  Tagging is a textual annotation technique that involves assigning to a document terms and phrases that are representative of its semantic content.  The term “representative” may have a different interpretation depending on the reason why tagging is employed:  Classifying a document by means of concepts that represent meaningful categories for the document user.  Summarizing a document's content by means of keywords that constitute a representative description of what the document «talks specifically about»  Characterizing a document by means of proper adjectives that denote some kind of judgment(e.g. “positive“, “negative“)  In all cases, the automatic generation of tags remains an open issue. 3 l June 5, 2012
  • 4. Focus  We are interested in tagging not so much as a classification process but rather as a summarization one.  This means, for example, that we do not wish to determine whether a given document is about sports or politics but which specific sport events or politicians it is about.  The challenge in this kind of identification is to be able to distinguish between the keywords that play a central role to the document's meaning and those that are just complementary to it.  For example, a piece of news might make reference to many politicians even when its primary subject is only one of them. 4 l June 5, 2012
  • 5. Approach & Assumptions  We follow a “detective-like” approach and we try to find evidence within the document that point towards the correct tag(s).  We assume that we have available for the documents we wish to tag some comprehensive ontology that describes their domain.  Since we wish to tag the documents with specific entities (e.g. films, director, actors etc) we consider as candidate tags the instances of the ontology's concepts.  We then consider two fundamental premises: 2. That an instance is more likely to represent the text‘s meaning when there are many other ontologically related to it instances in the text. 3. That not all relations in a domain ontology are equally important in the above process. 5 l June 5, 2012
  • 6. Approach & Assumptions  For example, in a historical document, a given event is a likely tag when the text also contains persons, locations and other events related to this event.  Similarly, in a document about cinema, a given film is a likely tag when the text also contains persons involved in the film (directors, actors, characters etc.).  However, the relation linking films and films characters is more important than the relation linking films and directors when it comes to determining whether the text actually refers to a given film.  The reason is that the reference within a specific film text of a film character that wasn't part of it, is less likely than the reference of a director. 6 l June 5, 2012
  • 7. Proposed Tagging Framework  Two components:  A tagging context model that models the relative importance of ontological relations to the tag identification process.  A tag recommendation process that determines, for a given text, the ontology instances that are potential tags for it and recommends to the user the ones with the highest confidence score. 7 l June 5, 2012
  • 8. Tagging Context  Based on the first assumption, we say than an instance A is supported by another (related to it) instance B when the existence of B in a text is an indication that A is a potential tag.  The Tagging Context defines for each ontology concept which relations and to what extent should be used for determining the level of support that the concept’s instances “enjoy” from other instances in a given text.  For example, for the concept Film a potential tagging context would be:  hasDirector: 0.7  hasActor: 0.5  hasCharacter: 0.9  That would mean that films are generally supported firstly by characters, then by directors and lastly by actors that are related to them. 8 l June 5, 2012
  • 9. Tag Recommendation Process 1. We define the Tagging Context Model for the given scenario • We select the ontology concepts whose instances are going to be used as candidate tags (tag concepts) in the given scenario. • For each of these concepts we determine the ontological relations that have this class within their domain and we define their relative importance in the tagging process. • We extract from the text the terms that match to instances of the tag concepts as well those that match to instances of the concepts that fall within the range of the tag ontological relations. • E.g. in the film example we would extract films, directors, actors and characters. 9 l June 5, 2012
  • 10. Tag Recommendation Process • Using the ontological relations of the context we derive from the above set of terms those that are related to instances of the tag concepts. The derived terms comprise the candidate tags for the text • E.g. in the film example, candidate tags would be the extracted films as well as those that are related to the extracted actors, directors and characters. • For each candidate tag we compute the set of text-found instances that support the candidacy of the tag to a degree derived from the tagging context. • E.g. for the candidate tag “Annie Hall” a possible set could be: • Woody Allen:0.7 • Annie Hall: 1 • Alvy Singer: 0.9 10 l June 5, 2012
  • 11. Tag Recommendation Process 1. For each of the candidate tag we calculate three scores: • Its Ontological Support, namely the degree to which the instances found within the text imply that the candidate tag actually characterizes the text. • Its Ontological Ambiguity, namely the degree to which the ontology entities that are found within the text and support the tag, support other tags as well. • Its Tag Confidence, namely the relative confidence that the tag actually characterizes the text. 11 l June 5, 2012
  • 12. Ontological Support  Intuitively, the ontological support of a candidate tag is analogous to the number and support degrees of the tag's supporting text instances.  Thus we calculate the overall support score for a given candidate tag as the sum of the its partial supports, weighted by the relative number of the instances that support it.  This weighting is important as our aim is to compute for the tag a support that is relative to the supports of the other tags. 12 l June 5, 2012
  • 13. Ontological Ambiguity  It is the degree to which the ontology entities that are found within the text and support the tag, support other tags as well.  Intuitively, for a given tag, the more additional tags its support set also supports, the higher is the tag’s ambiguity.  The ambiguity between two tags is calculated as follows:  First we find the set of text extracted instances that commonly support the two tags.  Then we calculate the contribution degree of the common set to each of the wholes based on the following intuition:  If the two contributions are high and comparable then the ambiguity is high.  If they are low and comparable then the ambiguity is low  If they are non-comparable (i.e. one relatively high and one relatively low) then again the ambiguity is low. 13 l June 5, 2012
  • 15. Tag Confidence  To derive an overall score about a given tag‘s likelihood that it actually characterizes the text we use its ambiguity score in order to “adjust” its initial ontological support.  The intuition here is that the more ambiguous is the tag, the less “reliable” is its support. Thus, the overall tag confidence is calculated as follows:  Tuning variable w is a weight that adjusts the influence of the ambiguity score to the total confidence.  The above is not an absolute measure of tag confidence but rather a relative one that enables the ranking of the candidate tags. 15 l June 5, 2012
  • 16. Example  Tagging a film review about the film “Steel” “How's this for diminishing returns? In BATMAN AND ROBIN, George Clooney battled Arnold Schwarzenegger. In SPAWN, it was Michael Jai White versus John Leguizamo. In STEEL, the third and presumably final superhero stretch of the summer, Shaquille O'Neal dons a high-tech, hand-crafted suit of armor to combat the earth-shaking, world-shattering, super-duper-ultra evil menace of... Judd Nelson? ...” 16 l June 5, 2012
  • 17. Experimental Evaluation  Dataset of 1000 film reviews randomly selected from imdb.  Film ontology derived from Freebase • ~148000 films, ~36000 directors, ~145000 actors, ~63000 characters  For each review we generated two set of recommended tags  One using a basic key phrase extraction methodology in which we assumed that the most frequent film found within the text is the one the text is talking about.  One using our proposed method using the tagging context of the example.  We then measured the success of each method by measuring the number of cases in which the tag with the highest confidence was the correct one.  Our method achieved a score of 89% while the basic method only 45%. 17 l June 5, 2012
  • 18. Summary  We proposed a novel method for automatically generating and recommending semantic tags for text documents in an effort to summarize the intended meaning of their content.  Our approach has been based on the customized utilization of domain-specific ontological relations for extracting and evaluating “evidence“ from within the text that may identify the correct tag(s) in the given tagging scenario.  A comprehensive experimental evaluation of the method highlighted its high effectiveness for the tag recommendation task. 18 l June 5, 2012
  • 19. Future Work  Establishment of the effectiveness of our method in more complex and ambiguous domains and with larger datasets.  Extension/enhancement of the method in order to:  Cope with knowledge incompleteness and/or inconsistency.  Determine the values of the tagging context in an automated way.  Determine the exact number of appropriate tags for the document. 19 l June 5, 2012
  • 20. Thank you… Panos Alexopoulos Semantic Solutions Architect - Researcher IMC Technologies SA 2 Marathonos Str. & 360A Kifissias Av. 15233, Athens, Greece T: +30 210 6927 378 F: +30 210 6926 813 E: palexopoulos@imc.com.gr W: www.imc.com.gr 20 l June 5, 2012