SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
TU Graz – Knowledge Management Institute




                  Of Categorizers and Describers:
                   An Evaluation of Quantitative
                  Measures for Tagging Motivation
                             Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier

                                       Knowledge Management Institute and Know-Center
                                             Graz University of Technology, Austria




                                                Hypertext 2010, June 15th, 2010
                                                                                                  1
TU Graz – Knowledge Management Institute




                                               Introduction
            Lots of research on folksonomies, their structure and the
              resulting dynamics

            What we do not know are the reasons and motivations
             users have when they tag.

                                                            Question: Why do users tag?




                                           Hypertext 2010, June 15th, 2010
                                                                                      2
TU Graz – Knowledge Management Institute




                                                      Motivation

                 Knowledge about intuitions why users are tagging would help to answer a
                 number of current research questions:
                          What are possible improvements for tag recommendation?
                          What are suitable search terms for items in these systems?
                          How can we enhance ontology learning?
                          …
                                           There already exist models for tagging motivation such as [Nov2009] and [Heckner2009].


                                                                                    BUT: These models rely on expert judgements



                                      Automatic measures for inference of tagging motivation are important!



                                                Hypertext 2010, June 15th, 2010
                                                                                                                        3
TU Graz – Knowledge Management Institute




                                           Presentation Overview
            • Research questions

            • Two types of tagging motivation

            • Approximating tagging motivation

            • Experiments and results
                 – Quantitative Evaluation
                 – Qualitative Evaluation




                                              Hypertext 2010, June 15th, 2010
                                                                                4
TU Graz – Knowledge Management Institute




                                                 Questions
            Can tagging motivation be approximated with statistical
              measures?

            What are measures which enable the inference if a
             given user has a certain motivation?

            Which of these measures perform best to differentiate
             between different types of tagging motivation?

            Does the distinction of the proposed tagging motivation
              types have an influence on the tagging process?

                                           Hypertext 2010, June 15th, 2010
                                                                             5
TU Graz – Knowledge Management Institute




                                Types of Tagging Motivations
                                                                  Categorizer            Describer
                                              Goal                later browsing        later retrieval
                                    Change of vocabulary              costly                cheap
                                      Size of vocabulary              limited                open
                                              Tags                  subjective             objective
                                            Tag reuse                frequent                rare
                                           Tag purpose         mimicking taxonomy      descriptive labels




                                     In the “real world” users are driven by a
                                        combination of both motivations
                                           – e.g. using tags as descriptive labels while maintaining a
                                             few categories
                                                                                             [Körner2009]
                                                     Hypertext 2010, June 15th, 2010
                                                                                                            6
TU Graz – Knowledge Management Institute




                                                 Terminology
            Folksonomies are usually represented by tripartite graphs with
               hyper edges

            Three different disjoint sets:
                 – a set of users u ∈ U
                 – a set of tags t ∈ T
                 – a set of resources r ∈ R


            A folksonomy is defined as a set of annotations F ⊆ U x T x R

            Personomy is the reduction of a folksonomy F to a user u

            A tag assignment (tas) is one specific triple of one user u, tag t
               and resource r.
                                              Hypertext 2010, June 15th, 2010
                                                                                 7
‰
                                                                D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t
                                                                                         o
 sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n =
                                                                                 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr)
                                                                                       |T |
   TU in the real world would likely                                                         , Tu = {t||R(t)| /(E,#?8::()/#
                                                                =0,:()0"0:8*08 #=.90(, |Tu |                                1
 ion of both motivations, for example following a description                   Tag/resource ratio relates the vocabu
 pproach to annotating most resources, while at the same     ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.#
                                                                             to the total number of resources annot
                                                             ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(#
                      Approximating Tagging Motivation / 1
 ime maintaining a few categories. Table 2 gives an overview     4.4 Conditional Tag Entropy (cte)                      ,(.#
                                                                             Describers, who use a variety of differen
                                                             ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- #
 f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v       :.))(/:#
                                                                             sources, can be expected to be maximally
 ation.                                                                                       +:0*0:0(, #90"(.#E8)#
                                                             :)89(* #:+:.)08* #than categorizers, who use fewer assi
                                                                              with #:<?( #
                                                                 inative sure :9regard to the resources they are tag
                                                                                          :<?.;)8?>< #



                                                             E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags
                                                                 This would allow categorizers to effectively use like
                                                                             ited vocabulary, a categorizer would
  Goal
               Based later browsing
                         on different intuitions F(0:;(0,: score browsing.measureobservation can be w
                        Categorizer          Describer
                                             later retrieval
                                                             various measures for the describer e
                                                                 igation and on this          This than a
                   differentiation were developed: oretically unlimited vocabulary. Equatio
  Change of vocabulary
  Size of vocabulary
                        costly
                        limited
                                             cheap
                                             open
                                                                 to develop a measure for tagging motivation when
                                                                 taggingmula used for this calculation entropy Ru
                                                                               as an encoding process, where where can
                                                         Figure 1: Tag cloud example of a categorizer. Fre-
  Tags                  subjective           objective
  Tag reuse             frequent             rare        quency among tags is balanced, annotatedtags a user u
                                                                 sideredsources whichthe suitability of by for this
                                                                              a measure of were a potential indicator
                                                                 categorizer would have aid for navigation. maint
                                                                             sure set as an a strong incentive to
                                             descriptivefor using the tag does not reflect on is the average n
              •
  Tag purpose           mimicking taxonomy                labels
                  Tag/Resource Ratio (trr)                       tag entropy (or information value) in her tag cloud.
                                                                             tags per post.
                                                                 words, a categorizer would want the tag-frequency a
Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because
                   – How about Categorizers use? De-     be      distributed as possible in order for her to be use
 cribers
                                                                 navigational introduce noise tags would |Tu of litt
                                                         orphaned tags wouldaid. Otherwise, to their personal tax-
                                                                                                      trr(u) =    be |
                                                         onomy.browsing. A describer on the otherwould |Rurepre-
                                                                   For a describer’s tag vocabulary, it hand be |   would h

4.
        • Orphaned Tag Ratio
     MEASURES FOR TAGGING
                                                           sented interest incloser to 1 due to the fact thatas tags are
                                                                  by values maintaining high tag entropy describers
                                                           tag resources in a verbose and descriptive way, and do not
                  – How many tags of a users           vocabulary are order to Orphaned suitability vocabulary.
                                                                     introduction measure fewTag Ratio
                                                                          4.3 of orphaned resources?
                                                                  for navigation at all.
                                                           mind the In attached to onlythetags to their of tags to
     MOTIVATION
                                                          resources,To capture an entropy-based measure ı r
                                                                     we develop tag reuse, the ‰  orphan tag   for
  In the following measures which capture properties of the
                                                          motivation,| usingthe degreetagswhich |R(tmax )|reso
                                                                 acterizes the set of to and the set of
                                                                    o
                                                                  |Tu                               users prod
              • Conditional Tag Entropy
                                                                          o
                                                     orphan(u) = Orphaned {t||R(t)| ≤
 wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy.
                                                                   variables tags are tags that          assigne
                                                                                           conditional 100
                                                          employs tagsand encode resources, the conditional
                                                                 only, to therefore are used infrequently.    (2)
4.1 Terminology  – How well does a user “encode” resources with his tags? the percentage of items in a
                                                          should ratio captures
                                                                 reflect the effectiveness of this encoding pro
  Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ
                                                          graphs        that represent such orphaned
                                                            For categorizers,set of orphaned X maximally discrim-
with hyper edges. Such graphs hold three finite, disjoint sets                             X tags
                                                                        the useful tags should be in a user’s tag vo
                                                                             H(R|T ) = −          p(r, t)log2 (p(r|t))
which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to.
                                                         inative        on a to the resources threshold n is deriv
                                                         This would allow categorizers tor∈Rstyle inuse tags tmax de
 nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging
                                                                                              t∈T
                                                                                          effectively which for nav-
                                            Hypertext A folkson-
                                                          T × R The was used the observation can be exploited
                                                                        joint probability p(r, t) depends on the dis
 my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n     8
                                                             to develop a measure for tagging motivation when viewing
sidered a measure of the suitability of tags for this task. A  categorizer put in relation to the conditional entropy
                                                                 free from intersections. On the other hand, descr
categorizer would have a strong incentive to maintain high     ideal categorizer:
  TU Graz – Knowledge Management Institute                       not care about a possibly high overlap factor si
tag entropy (or information value) in her tag cloud. In other
words, a categorizer would want the tag-frequency as equally     not use tags for navigation but instead aim to b
distributed as possible in order for her to be useful as a       later retrieval. = H(R|T ) − Hopt (R|T )
                                                                                 cte
                                                                                             Hopt (R|T )
                       Approximating Tagging Motivation / 2
navigational aid. Otherwise, tags would be of little use in
browsing. A describer on the other hand would have little        4.6 Tag/Title Intersection Ratio (ttr)
                                                               4.5 Overlap Factor
interest in maintaining high tag entropy as tags are not used       In order to address the objectiveness or subje
                                                                  When users assign more than one tag per resource o
for navigation at all.                                           tags, we introduce the tag/title intersection rat
            • Overlap Factor
   In order to measure the suitability of tags to navigate
resources, we develop an entropy-based measure for tagging
                                                               age, it is possible that they produce an overlap (i.e. in
                                                                 an indicator how likely users choose tags from t
                                                               tion with regard to the resource sets of corresponding
                                                               The overlap factor (e.g. the title of a web phenomen
                                                                 a resource’s title allows to measure this page). T
motivation, using the set of tags andas discriminative as
                   – Are tags used the set of resources categories?
                                                               relating the number of all the intersectiontotal num
                                                                 is calculated by taking resources to the of the t
random variables to calculate conditional entropy. If a user
                                                               tag assignments of a user andspecific user. follows:
                                                                 resource’s title words of a is defined as At first,
employs tags to encode resources, the conditional entropy
                                                                 titles occurring in a personomy are tokenized t
should reflect the effectiveness of this encoding process:                                             |R |
                                                                 set of title words T Wu . = 1 − weufiltered the ta
                                                                                    overlap Then
                           XX                                                                      |T ASu |
                                                                 words using the stop-word list which is packag
             H(R|T ) = −             p(r, t)log2 (p(r|t))  (3)   Snowball1 stemmer. For normalization purpose
            • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca
                                r∈R t∈T
                                                               We can speculate that categorizers would be interes
                                              keeping this overlap relatively low in order the a
                                               the
  The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th
               – How likely does a user distribution         the set of title words.
                     from the title as tags?
                                                                                                                  |Tu ∩ T Wu |
                                                                                                          ttr =
                                                                                                                     |T Wu |
                                            Categorizer              Describer
                                                                                      4.7   Properties ofMeasure Presented Meas
                                                                                                   Proposed
                                                                                                            the
                     Goal                   later browsing          later retrieval

              Change of vocabulary              costly                  cheap
                                                                                         When examining the five presented measures,
               Size of vocabulary               limited                    open
                                                                                      serve that the measures Ratio
                                                                                                          Tag/Resource
                                                                                                                         focus on tagging behav
                     Tags                     subjective                 objective
                                                                                      as opposed to Tag/Titlesemantics of tags. This ma
                                                                                                       the Intersection Ratio
                   Tag reuse                   frequent                    rare
                                                                                      troduced measures independent of particular lan
                                                                                                  Orphaned Tag Ratio / Cond. Tag Entropy

                  Tag purpose             mimicking taxonomy
                                                                                      advantage of this is that the approach is not in
                                                                     descriptive labels                      Overlap Factor
                                                                                      special characters, internet slang or user specific
                                                       Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat
                                                                                      , 2010
                                                                                      properties of a single user personomy only; there  9
TU Graz – Knowledge Management Institute




                     Approximating Tagging Motivation / 3
            Properties of the developed measures:

            • Agnostic to the semantics of used language

            • Evaluate behavior of single user (as opposed to complete
                 folksonomy)
                 – no comparison to the complete folksonomy necessary


            • Inspect the usage of tags and NOT their semantic
                 meaning
                 – How often are tags used?
                 – How many tags are used on average to annotate a resource?
                 – How good does a user “encode” her resources with tags?

                                           Hypertext 2010, June 15th, 2010
                                                                               10
TU Graz – Knowledge Management Institute




                                           Experimental Setup
            Delicious dataset
                 – part of a collection of tagging datasets which we crawled from May to June
                   2009
                 – Captured folksonomy consists of:
                      • 896 users
                      • 184,746 tags
                      • 1,089,653 resources


            Requirements for the dataset
                 – Holding complete personomies
                      • all tags and resources which were publicly available
                 – Chronological order of the posts should be conserved
                      • To capture changes in tagging behavior
                 – “Mostly inactive” users who do not have a lot of annotated resources should be
                   neglected
                      • The lower bound of tagged resources was 1000 in the case of the Delicious dataset

                                              Hypertext 2010, June 15th, 2010
                                                                                                       11
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation
Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

Contenu connexe

En vedette

Zensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar Technologies Ltd.
 
CRMC 2013 "Power to the People"
CRMC 2013   "Power to the People"CRMC 2013   "Power to the People"
CRMC 2013 "Power to the People"dunnhumby
 
Spatial Processing with SAP HANA
Spatial Processing with SAP HANA Spatial Processing with SAP HANA
Spatial Processing with SAP HANA SAP Technology
 
Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016dunnhumby
 
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...Instantly
 
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?dunnhumby
 
Business Intelligence in Retail Industry
Business Intelligence in Retail IndustryBusiness Intelligence in Retail Industry
Business Intelligence in Retail IndustryVõ Duy Tuấn
 
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...Cloneskills
 
IBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateIBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateArrow ECS UK
 

En vedette (10)

Zensar’s Blockchain enablement framework
Zensar’s Blockchain enablement frameworkZensar’s Blockchain enablement framework
Zensar’s Blockchain enablement framework
 
CRMC 2013 "Power to the People"
CRMC 2013   "Power to the People"CRMC 2013   "Power to the People"
CRMC 2013 "Power to the People"
 
Spatial Processing with SAP HANA
Spatial Processing with SAP HANA Spatial Processing with SAP HANA
Spatial Processing with SAP HANA
 
Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016Dunnhumby forrester webinar 16 11 2016
Dunnhumby forrester webinar 16 11 2016
 
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
CPG Innovation From Ideation to Aisle: New Techniques for Staying Ahead of Co...
 
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?Are Your CPG Brands Maximizing the Return on Your Digital Investment?
Are Your CPG Brands Maximizing the Return on Your Digital Investment?
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Business Intelligence in Retail Industry
Business Intelligence in Retail IndustryBusiness Intelligence in Retail Industry
Business Intelligence in Retail Industry
 
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
SAP HANA & HADOOP Implementation - Predictive Analytics – CPG and Retail on U...
 
IBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan templateIBM - Full year Go-to-market plan template
IBM - Full year Go-to-market plan template
 

Similaire à Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

Extracting Semantics from Crowds
Extracting Semantics from CrowdsExtracting Semantics from Crowds
Extracting Semantics from CrowdsMarkus Strohmaier
 
Towards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingTowards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingChristian Körner
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityInovex GmbH
 
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterMeaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterGabriela Agustini
 
Pragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesPragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesMarkus Strohmaier
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classificationazubiaga
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social ComputingEd Chi
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text miningIRJET Journal
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
SIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageSIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageRiina Vuorikari
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowdsMarkus Strohmaier
 
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Clay Spinuzzi
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaEditor IJCATR
 

Similaire à Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation (20)

Extracting Semantics from Crowds
Extracting Semantics from CrowdsExtracting Semantics from Crowds
Extracting Semantics from Crowds
 
Towards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind TaggingTowards Understanding the Motivation Behind Tagging
Towards Understanding the Motivation Behind Tagging
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
 
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on TwitterMeaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter
 
Improving Tag Clouds
Improving Tag CloudsImproving Tag Clouds
Improving Tag Clouds
 
Pragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomiesPragmatic evaluation of folksonomies
Pragmatic evaluation of folksonomies
 
Harnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource ClassificationHarnessing Folksonomies for Resource Classification
Harnessing Folksonomies for Resource Classification
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social Computing
 
Topic detecton by clustering and text mining
Topic detecton by clustering and text miningTopic detecton by clustering and text mining
Topic detecton by clustering and text mining
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect Information
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
SIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag UsageSIRTEL'08 Cross Repository Tag Usage
SIRTEL'08 Cross Repository Tag Usage
 
Using Controlled Vocabularies
Using Controlled VocabulariesUsing Controlled Vocabularies
Using Controlled Vocabularies
 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
 
Extracting semantics from crowds
Extracting semantics from crowdsExtracting semantics from crowds
Extracting semantics from crowds
 
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
Dartmouth discussion: What's wrong with "What's wrong with CHAT?"?
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
A Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social MediaA Survey on Decision Support Systems in Social Media
A Survey on Decision Support Systems in Social Media
 

Dernier

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Dernier (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation

  • 1. TU Graz – Knowledge Management Institute Of Categorizers and Describers: An Evaluation of Quantitative Measures for Tagging Motivation Christian Körner, Roman Kern, Hans-Peter Grahsl, Markus Strohmaier Knowledge Management Institute and Know-Center Graz University of Technology, Austria Hypertext 2010, June 15th, 2010 1
  • 2. TU Graz – Knowledge Management Institute Introduction Lots of research on folksonomies, their structure and the resulting dynamics What we do not know are the reasons and motivations users have when they tag. Question: Why do users tag? Hypertext 2010, June 15th, 2010 2
  • 3. TU Graz – Knowledge Management Institute Motivation Knowledge about intuitions why users are tagging would help to answer a number of current research questions: What are possible improvements for tag recommendation? What are suitable search terms for items in these systems? How can we enhance ontology learning? … There already exist models for tagging motivation such as [Nov2009] and [Heckner2009]. BUT: These models rely on expert judgements Automatic measures for inference of tagging motivation are important! Hypertext 2010, June 15th, 2010 3
  • 4. TU Graz – Knowledge Management Institute Presentation Overview • Research questions • Two types of tagging motivation • Approximating tagging motivation • Experiments and results – Quantitative Evaluation – Qualitative Evaluation Hypertext 2010, June 15th, 2010 4
  • 5. TU Graz – Knowledge Management Institute Questions Can tagging motivation be approximated with statistical measures? What are measures which enable the inference if a given user has a certain motivation? Which of these measures perform best to differentiate between different types of tagging motivation? Does the distinction of the proposed tagging motivation types have an influence on the tagging process? Hypertext 2010, June 15th, 2010 5
  • 6. TU Graz – Knowledge Management Institute Types of Tagging Motivations Categorizer Describer Goal later browsing later retrieval Change of vocabulary costly cheap Size of vocabulary limited open Tags subjective objective Tag reuse frequent rare Tag purpose mimicking taxonomy descriptive labels In the “real world” users are driven by a combination of both motivations – e.g. using tags as descriptive labels while maintaining a few categories [Körner2009] Hypertext 2010, June 15th, 2010 6
  • 7. TU Graz – Knowledge Management Institute Terminology Folksonomies are usually represented by tripartite graphs with hyper edges Three different disjoint sets: – a set of users u ∈ U – a set of tags t ∈ T – a set of resources r ∈ R A folksonomy is defined as a set of annotations F ⊆ U x T x R Personomy is the reduction of a folksonomy F to a user u A tag assignment (tas) is one specific triple of one user u, tag t and resource r. Hypertext 2010, June 15th, 2010 7
  • 8. D898,-)0?: #D.4 # *.;., #u=8- #o=8A08 #=80/,:)(8= #=("08#|R(t o sers Graz – Knowledge Management Institute be driven by a combina- orphan(u) =Tag/Resource Ratio n = 4.2 #=+,0-#/890;8:0./ #/()" #≤ n}, (trr) |T | TU in the real world would likely , Tu = {t||R(t)| /(E,#?8::()/# =0,:()0"0:8*08 #=.90(, |Tu | 1 ion of both motivations, for example following a description Tag/resource ratio relates the vocabu pproach to annotating most resources, while at the same ?>.:.;)8?><#?>? #?0,.,#?0@(* #?.*0:0-,#?.):A.*0.# to the total number of resources annot ?)0/:#?)098-<#)(-0?(,#)(*0;0./#)0;>:,#,8:(**0:(#,-0(/-(# Approximating Tagging Motivation / 1 ime maintaining a few categories. Table 2 gives an overview 4.4 Conditional Tag Entropy (cte) ,(.# Describers, who use a variety of differen ,>.-BE89( #,>.?#,.-0(:<#,:.-B#,:)((:8):#:-?8#:(=?*8:(#:>- # f different intuitions about the two types of tagging moti- For categorizers, useful tags shouldscore higher v :.))(/:# sources, can be expected to be maximally ation. +:0*0:0(, #90"(.#E8)# :)89(* #:+:.)08* #than categorizers, who use fewer assi with #:<?( # inative sure :9regard to the resources they are tag :<?.;)8?>< # E(4567#E(4"(,0;/#E(4"(9#E.=(/#E.)*"#E:A#tags This would allow categorizers to effectively use like ited vocabulary, a categorizer would Goal Based later browsing on different intuitions F(0:;(0,: score browsing.measureobservation can be w Categorizer Describer later retrieval various measures for the describer e igation and on this This than a differentiation were developed: oretically unlimited vocabulary. Equatio Change of vocabulary Size of vocabulary costly limited cheap open to develop a measure for tagging motivation when taggingmula used for this calculation entropy Ru as an encoding process, where where can Figure 1: Tag cloud example of a categorizer. Fre- Tags subjective objective Tag reuse frequent rare quency among tags is balanced, annotatedtags a user u sideredsources whichthe suitability of by for this a measure of were a potential indicator categorizer would have aid for navigation. maint sure set as an a strong incentive to descriptivefor using the tag does not reflect on is the average n • Tag purpose mimicking taxonomy labels Tag/Resource Ratio (trr) tag entropy (or information value) in her tag cloud. tags per post. words, a categorizer would want the tag-frequency a Table 2: Intuitions many tags does a user and expected to be represented by values closer to 0 because – How about Categorizers use? De- be distributed as possible in order for her to be use cribers navigational introduce noise tags would |Tu of litt orphaned tags wouldaid. Otherwise, to their personal tax- trr(u) = be | onomy.browsing. A describer on the otherwould |Rurepre- For a describer’s tag vocabulary, it hand be | would h 4. • Orphaned Tag Ratio MEASURES FOR TAGGING sented interest incloser to 1 due to the fact thatas tags are by values maintaining high tag entropy describers tag resources in a verbose and descriptive way, and do not – How many tags of a users vocabulary are order to Orphaned suitability vocabulary. introduction measure fewTag Ratio 4.3 of orphaned resources? for navigation at all. mind the In attached to onlythetags to their of tags to MOTIVATION resources,To capture an entropy-based measure ı r we develop tag reuse, the ‰ orphan tag for In the following measures which capture properties of the motivation,| usingthe degreetagswhich |R(tmax )|reso acterizes the set of to and the set of o |Tu users prod • Conditional Tag Entropy o orphan(u) = Orphaned {t||R(t)| ≤ wo types of tagging motivation (Table 2) are introduced. random |Tu | , Tu = to calculaten}, n = areentropy. variables tags are tags that assigne conditional 100 employs tagsand encode resources, the conditional only, to therefore are used infrequently. (2) 4.1 Terminology – How well does a user “encode” resources with his tags? the percentage of items in a should ratio captures reflect the effectiveness of this encoding pro Folksonomies are usually represented by tripartite 4.4 Conditional Tag Entropy (cte) tags. In equ graphs that represent such orphaned For categorizers,set of orphaned X maximally discrim- with hyper edges. Such graphs hold three finite, disjoint sets X tags the useful tags should be in a user’s tag vo H(R|T ) = − p(r, t)log2 (p(r|t)) which are 1) a set of users u ∈ U , 2) a set of resources r ∈ R with regardthreshold n. Thethey are assigned to. inative on a to the resources threshold n is deriv This would allow categorizers tor∈Rstyle inuse tags tmax de nd 3) a set of tags t ∈ T annotating resources R. 2010, June 15th, 2010individual tagging t∈T effectively which for nav- Hypertext A folkson- T × R The was used the observation can be exploited joint probability p(r, t) depends on the dis my as a whole is defined as the annotations F ⊆ U ×igation and browsing. This most. |Ru (t)| denotes the n 8 to develop a measure for tagging motivation when viewing
  • 9. sidered a measure of the suitability of tags for this task. A categorizer put in relation to the conditional entropy free from intersections. On the other hand, descr categorizer would have a strong incentive to maintain high ideal categorizer: TU Graz – Knowledge Management Institute not care about a possibly high overlap factor si tag entropy (or information value) in her tag cloud. In other words, a categorizer would want the tag-frequency as equally not use tags for navigation but instead aim to b distributed as possible in order for her to be useful as a later retrieval. = H(R|T ) − Hopt (R|T ) cte Hopt (R|T ) Approximating Tagging Motivation / 2 navigational aid. Otherwise, tags would be of little use in browsing. A describer on the other hand would have little 4.6 Tag/Title Intersection Ratio (ttr) 4.5 Overlap Factor interest in maintaining high tag entropy as tags are not used In order to address the objectiveness or subje When users assign more than one tag per resource o for navigation at all. tags, we introduce the tag/title intersection rat • Overlap Factor In order to measure the suitability of tags to navigate resources, we develop an entropy-based measure for tagging age, it is possible that they produce an overlap (i.e. in an indicator how likely users choose tags from t tion with regard to the resource sets of corresponding The overlap factor (e.g. the title of a web phenomen a resource’s title allows to measure this page). T motivation, using the set of tags andas discriminative as – Are tags used the set of resources categories? relating the number of all the intersectiontotal num is calculated by taking resources to the of the t random variables to calculate conditional entropy. If a user tag assignments of a user andspecific user. follows: resource’s title words of a is defined as At first, employs tags to encode resources, the conditional entropy titles occurring in a personomy are tokenized t should reflect the effectiveness of this encoding process: |R | set of title words T Wu . = 1 − weufiltered the ta overlap Then XX |T ASu | words using the stop-word list which is packag H(R|T ) = − p(r, t)log2 (p(r|t)) (3) Snowball1 stemmer. For normalization purpose • Tag/Title Intersection Ratio (ttr) resulting absolute intersection size toto beca r∈R t∈T We can speculate that categorizers would be interes keeping this overlap relatively low in order the a the The joint probability p(r, t) depends on the choose words produce discriminative categories, i.e. categories th – How likely does a user distribution the set of title words. from the title as tags? |Tu ∩ T Wu | ttr = |T Wu | Categorizer Describer 4.7 Properties ofMeasure Presented Meas Proposed the Goal later browsing later retrieval Change of vocabulary costly cheap When examining the five presented measures, Size of vocabulary limited open serve that the measures Ratio Tag/Resource focus on tagging behav Tags subjective objective as opposed to Tag/Titlesemantics of tags. This ma the Intersection Ratio Tag reuse frequent rare troduced measures independent of particular lan Orphaned Tag Ratio / Cond. Tag Entropy Tag purpose mimicking taxonomy advantage of this is that the approach is not in descriptive labels Overlap Factor special characters, internet slang or user specific Hypertext 2010, June 15th“to_read”). In addition, the measures evaluat , 2010 properties of a single user personomy only; there 9
  • 10. TU Graz – Knowledge Management Institute Approximating Tagging Motivation / 3 Properties of the developed measures: • Agnostic to the semantics of used language • Evaluate behavior of single user (as opposed to complete folksonomy) – no comparison to the complete folksonomy necessary • Inspect the usage of tags and NOT their semantic meaning – How often are tags used? – How many tags are used on average to annotate a resource? – How good does a user “encode” her resources with tags? Hypertext 2010, June 15th, 2010 10
  • 11. TU Graz – Knowledge Management Institute Experimental Setup Delicious dataset – part of a collection of tagging datasets which we crawled from May to June 2009 – Captured folksonomy consists of: • 896 users • 184,746 tags • 1,089,653 resources Requirements for the dataset – Holding complete personomies • all tags and resources which were publicly available – Chronological order of the posts should be conserved • To capture changes in tagging behavior – “Mostly inactive” users who do not have a lot of annotated resources should be neglected • The lower bound of tagged resources was 1000 in the case of the Delicious dataset Hypertext 2010, June 15th, 2010 11