SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
TU Graz – Knowledge Management Institute




                  Of Categorizers and Describers:
                   An Evaluation of Quantitative
                  Measures for Tagging Motivation
                             Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier

                                       Knowledge Management Institute and Know-Center
                                             Graz University of Technology, Austria




                                                Hypertext 2010, June 15th, 2010
                                                                                                  1
TU Graz – Knowledge Management Institute




                                               Introduction
            Lots of research on folksonomies, their structure and the
              resulting dynamics

            What we do not know are the reasons and motivations
             users have when they tag.

                                                            Question: Why do users tag?




                                           Hypertext 2010, June 15th, 2010
                                                                                      2
TU Graz – Knowledge Management Institute




                                                      Motivation

                 Knowledge about intuitions why users are tagging would help to answer a
                 number of current research questions:
                          What are possible improvements for tag recommendation?
                          What are suitable search terms for items in these systems?
                          How can we enhance ontology learning?
                          …
                                           There already exist models for tagging motivation such as [Nov2009] and [Heckner2009].


                                                                                    BUT: These models rely on expert judgements



                                      Automatic measures for inference of tagging motivation are important!



                                                Hypertext 2010, June 15th, 2010
                                                                                                                        3
TU Graz – Knowledge Management Institute




                                           Presentation Overview
            • Research questions

            • Two types of tagging motivation

            • Approximating tagging motivation

            • Experiments and results
                 – Quantitative Evaluation
                 – Qualitative Evaluation




                                              Hypertext 2010, June 15th, 2010
                                                                                4
TU Graz – Knowledge Management Institute




                                                 Questions
            Can tagging motivation be approximated with statistical
              measures?

            What are measures which enable the inference if a
             given user has a certain motivation?

            Which of these measures perform best to differentiate
             between different types of tagging motivation?

            Does the distinction of the proposed tagging motivation
              types have an influence on the tagging process?

                                           Hypertext 2010, June 15th, 2010
                                                                             5
TU Graz – Knowledge Management Institute




                                Types of Tagging Motivations
                                                                  Categorizer            Describer
                                              Goal                later browsing        later retrieval
                                    Change of vocabulary              costly                cheap
                                      Size of vocabulary              limited                open
                                              Tags                  subjective             objective
                                            Tag reuse                frequent                rare
                                           Tag purpose         mimicking taxonomy      descriptive labels




                                     In the “real world” users are driven by a
                                        combination of both motivations
                                           – e.g. using tags as descriptive labels while maintaining a
                                             few categories
                                                                                             [Körner2009]
                                                     Hypertext 2010, June 15th, 2010
                                                                                                            6
TU Graz – Knowledge Management Institute




                                                 Terminology
            Folksonomies are usually represented by tripartite graphs with
               hyper edges

            Three different disjoint sets:
                 – a set of users u ∈ U
                 – a set of tags t ∈ T
                 – a set of resources r ∈ R


            A folksonomy is defined as a set of annotations F ⊆ U x T x R

            Personomy is the reduction of a folksonomy F to a user u

            A tag assignment (tas) is one specific triple of one user u, tag t
               and resource r.
                                              Hypertext 2010, June 15th, 2010
                                                                                 7
‰
                                                                D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t
                                                                                         o
 sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n =
                                                                                 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr)
                                                                                       |T |
   TU in the real world would likely                                                         , Tu = {t||R(t)| /(E,#?8::()/#
                                                                =0,:()0"0:8*08 #=.90(, |Tu |                                1
 ion of both motivations, for example following a description                   Tag/resource ratio relates the vocabu
 pproach to annotating most resources, while at the same     ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.#
                                                                             to the total number of resources annot
                                                             ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(#
                      Approximating Tagging Motivation / 1
 ime maintaining a few categories. Table 2 gives an overview     4.4 Conditional Tag Entropy (cte)                      ,(.#
                                                                             Describers, who use a variety of differen
                                                             ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- #
 f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v       :.))(/:#
                                                                             sources, can be expected to be maximally
 ation.                                                                                       +:0*0:0(, #90"(.#E8)#
                                                             :)89(* #:+:.)08* #than categorizers, who use fewer assi
                                                                              with #:<?( #
                                                                 inative sure :9regard to the resources they are tag
                                                                                          :<?.;)8?>< #



                                                             E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags
                                                                 This would allow categorizers to effectively use like
                                                                             ited vocabulary, a categorizer would
  Goal
               Based later browsing
                         on different intuitions F(0:;(0,: score browsing.measureobservation can be w
                        Categorizer          Describer
                                             later retrieval
                                                             various measures for the describer e
                                                                 igation and on this          This than a
                   differentiation were developed: oretically unlimited vocabulary. Equatio
  Change of vocabulary
  Size of vocabulary
                        costly
                        limited
                                             cheap
                                             open
                                                                 to develop a measure for tagging motivation when
                                                                 taggingmula used for this calculation entropy Ru
                                                                               as an encoding process, where where can
                                                         Figure 1: Tag cloud example of a categorizer. Fre-
  Tags                  subjective           objective
  Tag reuse             frequent             rare        quency among tags is balanced, annotatedtags a user u
                                                                 sideredsources whichthe suitability of by for this
                                                                              a measure of were a potential indicator
                                                                 categorizer would have aid for navigation. maint
                                                                             sure set as an a strong incentive to
                                             descriptivefor using the tag does not reflect on is the average n
              •
  Tag purpose           mimicking taxonomy                labels
                  Tag/Resource Ratio (trr)                       tag entropy (or information value) in her tag cloud.
                                                                             tags per post.
                                                                 words, a categorizer would want the tag-frequency a
Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because
                   – How about Categorizers use? De-     be      distributed as possible in order for her to be use
 cribers
                                                                 navigational introduce noise tags would |Tu of litt
                                                         orphaned tags wouldaid. Otherwise, to their personal tax-
                                                                                                      trr(u) =    be |
                                                         onomy.browsing. A describer on the otherwould |Rurepre-
                                                                   For a describer’s tag vocabulary, it hand be |   would h

4.
        • Orphaned Tag Ratio
     MEASURES FOR TAGGING
                                                           sented interest incloser to 1 due to the fact thatas tags are
                                                                  by values maintaining high tag entropy describers
                                                           tag resources in a verbose and descriptive way, and do not
                  – How many tags of a users           vocabulary are order to Orphaned suitability vocabulary.
                                                                     introduction measure fewTag Ratio
                                                                          4.3 of orphaned resources?
                                                                  for navigation at all.
                                                           mind the In attached to onlythetags to their of tags to
     MOTIVATION
                                                          resources,To capture an entropy-based measure ı r
                                                                     we develop tag reuse, the ‰  orphan tag   for
  In the following measures which capture properties of the
                                                          motivation,| usingthe degreetagswhich |R(tmax )|reso
                                                                 acterizes the set of to and the set of
                                                                    o
                                                                  |Tu                               users prod
              • Conditional Tag Entropy
                                                                          o
                                                     orphan(u) = Orphaned {t||R(t)| ≤
 wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy.
                                                                   variables tags are tags that          assigne
                                                                                           conditional 100
                                                          employs tagsand encode resources, the conditional
                                                                 only, to therefore are used infrequently.    (2)
4.1 Terminology  – How well does a user “encode” resources with his tags? the percentage of items in a
                                                          should ratio captures
                                                                 reflect the effectiveness of this encoding pro
  Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ
                                                          graphs        that represent such orphaned
                                                            For categorizers,set of orphaned X maximally discrim-
with hyper edges. Such graphs hold three finite, disjoint sets                             X tags
                                                                        the useful tags should be in a user’s tag vo
                                                                             H(R|T ) = −          p(r, t)log2 (p(r|t))
which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to.
                                                         inative        on a to the resources threshold n is deriv
                                                         This would allow categorizers tor∈Rstyle inuse tags tmax de
 nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging
                                                                                              t∈T
                                                                                          effectively which for nav-
                                            Hypertext A folkson-
                                                          T × R The was used the observation can be exploited
                                                                        joint probability p(r, t) depends on the dis
 my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n     8
                                                             to develop a measure for tagging motivation when viewing
sidered a measure of the suitability of tags for this task. A  categorizer put in relation to the conditional entropy
                                                                 free from intersections. On the other hand, descr
categorizer would have a strong incentive to maintain high     ideal categorizer:
  TU Graz – Knowledge Management Institute                       not care about a possibly high overlap factor si
tag entropy (or information value) in her tag cloud. In other
words, a categorizer would want the tag-frequency as equally     not use tags for navigation but instead aim to b
distributed as possible in order for her to be useful as a       later retrieval. = H(R|T ) − Hopt (R|T )
                                                                                 cte
                                                                                             Hopt (R|T )
                       Approximating Tagging Motivation / 2
navigational aid. Otherwise, tags would be of little use in
browsing. A describer on the other hand would have little        4.6 Tag/Title Intersection Ratio (ttr)
                                                               4.5 Overlap Factor
interest in maintaining high tag entropy as tags are not used       In order to address the objectiveness or subje
                                                                  When users assign more than one tag per resource o
for navigation at all.                                           tags, we introduce the tag/title intersection rat
            • Overlap Factor
   In order to measure the suitability of tags to navigate
resources, we develop an entropy-based measure for tagging
                                                               age, it is possible that they produce an overlap (i.e. in
                                                                 an indicator how likely users choose tags from t
                                                               tion with regard to the resource sets of corresponding
                                                               The overlap factor (e.g. the title of a web phenomen
                                                                 a resource’s title allows to measure this page). T
motivation, using the set of tags andas discriminative as
                   – Are tags used the set of resources categories?
                                                               relating the number of all the intersectiontotal num
                                                                 is calculated by taking resources to the of the t
random variables to calculate conditional entropy. If a user
                                                               tag assignments of a user andspecific user. follows:
                                                                 resource’s title words of a is defined as At first,
employs tags to encode resources, the conditional entropy
                                                                 titles occurring in a personomy are tokenized t
should reflect the effectiveness of this encoding process:                                             |R |
                                                                 set of title words T Wu . = 1 − weufiltered the ta
                                                                                    overlap Then
                           XX                                                                      |T ASu |
                                                                 words using the stop-word list which is packag
             H(R|T ) = −             p(r, t)log2 (p(r|t))  (3)   Snowball1 stemmer. For normalization purpose
            • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca
                                r∈R t∈T
                                                               We can speculate that categorizers would be interes
                                              keeping this overlap relatively low in order the a
                                               the
  The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th
               – How likely does a user distribution         the set of title words.
                     from the title as tags?
                                                                                                                  |Tu ∩ T Wu |
                                                                                                          ttr =
                                                                                                                     |T Wu |
                                            Categorizer              Describer
                                                                                      4.7   Properties ofMeasure Presented Meas
                                                                                                   Proposed
                                                                                                            the
                     Goal                   later browsing          later retrieval

              Change of vocabulary              costly                  cheap
                                                                                         When examining the five presented measures,
               Size of vocabulary               limited                    open
                                                                                      serve that the measures Ratio
                                                                                                          Tag/Resource
                                                                                                                         focus on tagging behav
                     Tags                     subjective                 objective
                                                                                      as opposed to Tag/Titlesemantics of tags. This ma
                                                                                                       the Intersection Ratio
                   Tag reuse                   frequent                    rare
                                                                                      troduced measures independent of particular lan
                                                                                                  Orphaned Tag Ratio / Cond. Tag Entropy

                  Tag purpose             mimicking taxonomy
                                                                                      advantage of this is that the approach is not in
                                                                     descriptive labels                      Overlap Factor
                                                                                      special characters, internet slang or user specific
                                                       Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat
                                                                                      , 2010
                                                                                      properties of a single user personomy only; there  9
TU Graz – Knowledge Management Institute




                     Approximating Tagging Motivation / 3
            Properties of the developed measures:

            • Agnostic to the semantics of used language

            • Evaluate behavior of single user (as opposed to complete
                 folksonomy)
                 – no comparison to the complete folksonomy necessary


            • Inspect the usage of tags and NOT their semantic
                 meaning
                 – How often are tags used?
                 – How many tags are used on average to annotate a resource?
                 – How good does a user “encode” her resources with tags?

                                           Hypertext 2010, June 15th, 2010
                                                                               10
TU Graz – Knowledge Management Institute




                                           Experimental Setup
            Delicious dataset
                 – part of a collection of tagging datasets which we crawled from May to June
                   2009
                 – Captured folksonomy consists of:
                      • 896 users
                      • 184,746 tags
                      • 1,089,653 resources


            Requirements for the dataset
                 – Holding complete personomies
                      • all tags and resources which were publicly available
                 – Chronological order of the posts should be conserved
                      • To capture changes in tagging behavior
                 – “Mostly inactive” users who do not have a lot of annotated resources should be
                   neglected
                      • The lower bound of tagged resources was 1000 in the case of the Delicious dataset

                                              Hypertext 2010, June 15th, 2010
                                                                                                       11
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

Contenu connexe

En vedette

Zensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar Technologies Ltd.
 
CRMC 2013 "Power to the People"
CRMC 2013   "Power to the People"CRMC 2013   "Power to the People"
CRMC 2013 "Power to the People"dunnhumby
 
Spatial Processing with SAP HANA
Spatial Processing with SAP HANA Spatial Processing with SAP HANA
Spatial Processing with SAP HANA SAP Technology
 
Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016dunnhumby
 
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...Instantly
 
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?dunnhumby
 
Business Intelligence in Retail Industry
Business Intelligence in Retail IndustryBusiness Intelligence in Retail Industry
Business Intelligence in Retail IndustryVõ Duy Tuấn
 
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...Cloneskills
 
IBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateIBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateArrow ECS UK
 

En vedette (10)

Zensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar’s Blockchain enablement framework
Zensar’s Blockchain enablement framework
 
CRMC 2013 "Power to the People"
CRMC 2013   "Power to the People"CRMC 2013   "Power to the People"
CRMC 2013 "Power to the People"
 
Spatial Processing with SAP HANA
Spatial Processing with SAP HANA Spatial Processing with SAP HANA
Spatial Processing with SAP HANA
 
Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016
 
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
 
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Business Intelligence in Retail Industry
Business Intelligence in Retail IndustryBusiness Intelligence in Retail Industry
Business Intelligence in Retail Industry
 
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
 
IBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateIBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan template
 

Similaire à Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

Extracting Semantics from Crowds
Extracting Semantics from CrowdsExtracting Semantics from Crowds
Extracting Semantics from CrowdsMarkus Strohmaier
 
Towards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingTowards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingChristian Körner
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityInovex GmbH
 
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterMeaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterGabriela Agustini
 
Pragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesPragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesMarkus Strohmaier
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classificationazubiaga
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social ComputingEd Chi
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text miningIRJET Journal
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
SIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageSIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageRiina Vuorikari
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowdsMarkus Strohmaier
 
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Clay Spinuzzi
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 

Similaire à Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation (20)

Extracting Semantics from Crowds
Extracting Semantics from CrowdsExtracting Semantics from Crowds
Extracting Semantics from Crowds
 
Towards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingTowards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind Tagging
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
 
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterMeaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
 
Improving Tag Clouds
Improving Tag CloudsImproving Tag Clouds
Improving Tag Clouds
 
Pragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesPragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomies
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classification
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social Computing
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect Information
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
SIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageSIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag Usage
 
Using Controlled Vocabularies
Using Controlled VocabulariesUsing Controlled Vocabularies
Using Controlled Vocabularies
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowds
 
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 

Dernier

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

  • 1. TU Graz – Knowledge Management Institute Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier Knowledge Management Institute and Know-Center Graz University of Technology, Austria Hypertext 2010, June 15th, 2010 1
  • 2. TU Graz – Knowledge Management Institute Introduction Lots of research on folksonomies, their structure and the resulting dynamics What we do not know are the reasons and motivations users have when they tag. Question: Why do users tag? Hypertext 2010, June 15th, 2010 2
  • 3. TU Graz – Knowledge Management Institute Motivation Knowledge about intuitions why users are tagging would help to answer a number of current research questions: What are possible improvements for tag recommendation? What are suitable search terms for items in these systems? How can we enhance ontology learning? … There already exist models for tagging motivation such as [Nov2009] and [Heckner2009]. BUT: These models rely on expert judgements Automatic measures for inference of tagging motivation are important! Hypertext 2010, June 15th, 2010 3
  • 4. TU Graz – Knowledge Management Institute Presentation Overview • Research questions • Two types of tagging motivation • Approximating tagging motivation • Experiments and results – Quantitative Evaluation – Qualitative Evaluation Hypertext 2010, June 15th, 2010 4
  • 5. TU Graz – Knowledge Management Institute Questions Can tagging motivation be approximated with statistical measures? What are measures which enable the inference if a given user has a certain motivation? Which of these measures perform best to differentiate between different types of tagging motivation? Does the distinction of the proposed tagging motivation types have an influence on the tagging process? Hypertext 2010, June 15th, 2010 5
  • 6. TU Graz – Knowledge Management Institute Types of Tagging Motivations Categorizer Describer Goal later browsing later retrieval Change of vocabulary costly cheap Size of vocabulary limited open Tags subjective objective Tag reuse frequent rare Tag purpose mimicking taxonomy descriptive labels In the “real world” users are driven by a combination of both motivations – e.g. using tags as descriptive labels while maintaining a few categories [Körner2009] Hypertext 2010, June 15th, 2010 6
  • 7. TU Graz – Knowledge Management Institute Terminology Folksonomies are usually represented by tripartite graphs with hyper edges Three different disjoint sets: – a set of users u ∈ U – a set of tags t ∈ T – a set of resources r ∈ R A folksonomy is defined as a set of annotations F ⊆ U x T x R Personomy is the reduction of a folksonomy F to a user u A tag assignment (tas) is one specific triple of one user u, tag t and resource r. Hypertext 2010, June 15th, 2010 7
  • 8. D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t o sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n = 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr) |T | TU in the real world would likely , Tu = {t||R(t)| /(E,#?8::()/# =0,:()0"0:8*08 #=.90(, |Tu | 1 ion of both motivations, for example following a description Tag/resource ratio relates the vocabu pproach to annotating most resources, while at the same ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.# to the total number of resources annot ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(# Approximating Tagging Motivation / 1 ime maintaining a few categories. Table 2 gives an overview 4.4 Conditional Tag Entropy (cte) ,(.# Describers, who use a variety of differen ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- # f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v :.))(/:# sources, can be expected to be maximally ation. +:0*0:0(, #90"(.#E8)# :)89(* #:+:.)08* #than categorizers, who use fewer assi with #:<?( # inative sure :9regard to the resources they are tag :<?.;)8?>< # E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags This would allow categorizers to effectively use like ited vocabulary, a categorizer would Goal Based later browsing on different intuitions F(0:;(0,: score browsing.measureobservation can be w Categorizer Describer later retrieval various measures for the describer e igation and on this This than a differentiation were developed: oretically unlimited vocabulary. Equatio Change of vocabulary Size of vocabulary costly limited cheap open to develop a measure for tagging motivation when taggingmula used for this calculation entropy Ru as an encoding process, where where can Figure 1: Tag cloud example of a categorizer. Fre- Tags subjective objective Tag reuse frequent rare quency among tags is balanced, annotatedtags a user u sideredsources whichthe suitability of by for this a measure of were a potential indicator categorizer would have aid for navigation. maint sure set as an a strong incentive to descriptivefor using the tag does not reflect on is the average n • Tag purpose mimicking taxonomy labels Tag/Resource Ratio (trr) tag entropy (or information value) in her tag cloud. tags per post. words, a categorizer would want the tag-frequency a Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because – How about Categorizers use? De- be distributed as possible in order for her to be use cribers navigational introduce noise tags would |Tu of litt orphaned tags wouldaid. Otherwise, to their personal tax- trr(u) = be | onomy.browsing. A describer on the otherwould |Rurepre- For a describer’s tag vocabulary, it hand be | would h 4. • Orphaned Tag Ratio MEASURES FOR TAGGING sented interest incloser to 1 due to the fact thatas tags are by values maintaining high tag entropy describers tag resources in a verbose and descriptive way, and do not – How many tags of a users vocabulary are order to Orphaned suitability vocabulary. introduction measure fewTag Ratio 4.3 of orphaned resources? for navigation at all. mind the In attached to onlythetags to their of tags to MOTIVATION resources,To capture an entropy-based measure ı r we develop tag reuse, the ‰ orphan tag for In the following measures which capture properties of the motivation,| usingthe degreetagswhich |R(tmax )|reso acterizes the set of to and the set of o |Tu users prod • Conditional Tag Entropy o orphan(u) = Orphaned {t||R(t)| ≤ wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy. variables tags are tags that assigne conditional 100 employs tagsand encode resources, the conditional only, to therefore are used infrequently. (2) 4.1 Terminology – How well does a user “encode” resources with his tags? the percentage of items in a should ratio captures reflect the effectiveness of this encoding pro Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ graphs that represent such orphaned For categorizers,set of orphaned X maximally discrim- with hyper edges. Such graphs hold three finite, disjoint sets X tags the useful tags should be in a user’s tag vo H(R|T ) = − p(r, t)log2 (p(r|t)) which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to. inative on a to the resources threshold n is deriv This would allow categorizers tor∈Rstyle inuse tags tmax de nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging t∈T effectively which for nav- Hypertext A folkson- T × R The was used the observation can be exploited joint probability p(r, t) depends on the dis my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n 8 to develop a measure for tagging motivation when viewing
  • 9. sidered a measure of the suitability of tags for this task. A categorizer put in relation to the conditional entropy free from intersections. On the other hand, descr categorizer would have a strong incentive to maintain high ideal categorizer: TU Graz – Knowledge Management Institute not care about a possibly high overlap factor si tag entropy (or information value) in her tag cloud. In other words, a categorizer would want the tag-frequency as equally not use tags for navigation but instead aim to b distributed as possible in order for her to be useful as a later retrieval. = H(R|T ) − Hopt (R|T ) cte Hopt (R|T ) Approximating Tagging Motivation / 2 navigational aid. Otherwise, tags would be of little use in browsing. A describer on the other hand would have little 4.6 Tag/Title Intersection Ratio (ttr) 4.5 Overlap Factor interest in maintaining high tag entropy as tags are not used In order to address the objectiveness or subje When users assign more than one tag per resource o for navigation at all. tags, we introduce the tag/title intersection rat • Overlap Factor In order to measure the suitability of tags to navigate resources, we develop an entropy-based measure for tagging age, it is possible that they produce an overlap (i.e. in an indicator how likely users choose tags from t tion with regard to the resource sets of corresponding The overlap factor (e.g. the title of a web phenomen a resource’s title allows to measure this page). T motivation, using the set of tags andas discriminative as – Are tags used the set of resources categories? relating the number of all the intersectiontotal num is calculated by taking resources to the of the t random variables to calculate conditional entropy. If a user tag assignments of a user andspecific user. follows: resource’s title words of a is defined as At first, employs tags to encode resources, the conditional entropy titles occurring in a personomy are tokenized t should reflect the effectiveness of this encoding process: |R | set of title words T Wu . = 1 − weufiltered the ta overlap Then XX |T ASu | words using the stop-word list which is packag H(R|T ) = − p(r, t)log2 (p(r|t)) (3) Snowball1 stemmer. For normalization purpose • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca r∈R t∈T We can speculate that categorizers would be interes keeping this overlap relatively low in order the a the The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th – How likely does a user distribution the set of title words. from the title as tags? |Tu ∩ T Wu | ttr = |T Wu | Categorizer Describer 4.7 Properties ofMeasure Presented Meas Proposed the Goal later browsing later retrieval Change of vocabulary costly cheap When examining the five presented measures, Size of vocabulary limited open serve that the measures Ratio Tag/Resource focus on tagging behav Tags subjective objective as opposed to Tag/Titlesemantics of tags. This ma the Intersection Ratio Tag reuse frequent rare troduced measures independent of particular lan Orphaned Tag Ratio / Cond. Tag Entropy Tag purpose mimicking taxonomy advantage of this is that the approach is not in descriptive labels Overlap Factor special characters, internet slang or user specific Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat , 2010 properties of a single user personomy only; there 9
  • 10. TU Graz – Knowledge Management Institute Approximating Tagging Motivation / 3 Properties of the developed measures: • Agnostic to the semantics of used language • Evaluate behavior of single user (as opposed to complete folksonomy) – no comparison to the complete folksonomy necessary • Inspect the usage of tags and NOT their semantic meaning – How often are tags used? – How many tags are used on average to annotate a resource? – How good does a user “encode” her resources with tags? Hypertext 2010, June 15th, 2010 10
  • 11. TU Graz – Knowledge Management Institute Experimental Setup Delicious dataset – part of a collection of tagging datasets which we crawled from May to June 2009 – Captured folksonomy consists of: • 896 users • 184,746 tags • 1,089,653 resources Requirements for the dataset – Holding complete personomies • all tags and resources which were publicly available – Chronological order of the posts should be conserved • To capture changes in tagging behavior – “Mostly inactive” users who do not have a lot of annotated resources should be neglected • The lower bound of tagged resources was 1000 in the case of the Delicious dataset Hypertext 2010, June 15th, 2010 11