SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Meaning as Collective Use:
         Predicting Semantic Hashtag Categories on Twitter

                                         Lisa Posch                       Claudia Wagner
                               Graz University of Technology,          JOANNEUM RESEARCH,
                                  Knowledge Management                Institute for Information and
                                          Institute                  Communication Technologies
                                Inffeldgasse 13, 8010 Graz,          Steyrergasse 17, 8010 Graz,
                                           Austria                                Austria
                                 lposch@sbox.tugraz.at                clauwa@sbox.tugraz.at
                                     Philipp Singer                     Markus Strohmaier
                               Graz University of Technology,        Graz University of Technology,
                                  Knowledge Management                  Knowledge Management
                                          Institute                             Institute
                                Inffeldgasse 13, 8010 Graz,           Inffeldgasse 13, 8010 Graz,
                                           Austria                               Austria
                                philipp.singer@tugraz.at             markus.strohmaier@tugraz.at

ABSTRACT                                                              Thus, new methods and techniques for analyzing the se-
This paper sets out to explore whether data about the us-             mantics of hashtags are definitely needed.
age of hashtags on Twitter contains information about their              A simplistic view on Wittgenstein’s work [17] suggests
semantics. Towards that end, we perform initial statisti-             that meaning is use. This indicates that the meaning of
cal hypothesis tests to quantify the association between us-          a word is not defined by a reference to the object it denotes,
age patterns and semantics of hashtags. To assess the util-           but by the variety of uses to which the word is put. There-
ity of pragmatic features – which describe how a hashtag              fore, one can use the narrow, lexical context of a word (i.e.,
is used over time – for semantic analysis of hashtags, we             its co-occurring words) to approximate its meaning. Our
conduct various hashtag stream classification experiments              work builds on this observation, but focuses on the prag-
and compare their utility with the utility of lexical features.       matics of a word (i.e., how a word, or in our case a hashtag,
Our results indicate that pragmatic features indeed contain           is used by a large group of users) – rather than its narrow,
valuable information for classifying hashtags into semantic           lexical context.
categories. Although pragmatic features do not outperform                The aim of this work is to investigate to what extent prag-
lexical features in our experiments, we argue that pragmatic          matic characteristics of a hashtag (which capture how a large
features are important and relevant for settings in which tex-        group of users uses a hashtag) may reveal information about
tual information might be sparse or absent (e.g., in social           its semantics. Specifically, our work addresses the following
video streams).                                                       research questions:

                                                                         • Do different semantic categories of hashtags reveal sub-
Categories and Subject Descriptors                                         stantially different usage patterns?
H.3.5 [Information Storage and Retrieval]: Online In-                    • To what extent do pragmatic and lexical properties of
formation Services—Web-based services                                      hashtags help to predict the semantic category of a
                                                                           hashtag?
Keywords                                                                To address these research questions we conducted an em-
Twitter; hashtags; social structure; semantics                        pirical study on a broad range of diverse hashtag streams be-
                                                                      longing to eight different semantic categories (such as tech-
                                                                      nology, sports or idioms) which have been identified in pre-
1.   INTRODUCTION                                                     vious research [12] and have shown to be useful for grouping
   A hashtag is a string of characters preceded by the hash           hashtags. From each of the eight categories, we selected ten
(#) character and it is used on platforms like Twitter as             sample hashtags at random and collected temporal snap-
descriptive label or to build communities around particular           shots of messages containing at least one of these hashtags
topics [15]. To outside observers, the meaning of hashtags            at three different points in time. To quantify how hashtags
is usually difficult to analyze, as they consist of short, often        are used over time, we extended the set of pragmatic stream
abbreviated or concatenated concepts (e.g., #MSM2013).                measures which we introduced in our previous work [16] and
                                                                      applied them to the hashtag streams in our dataset. These
Copyright is held by the International World Wide Web Conference      pragmatic measures capture not only the social structure of
Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink
to the author’s site if the Material is used in electronic media.
                                                                      a hashtag at specific points in time, but also the changes in
WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil.          social structure over time.
ACM 978-1-4503-2038-2/13/05.
To answer the first research question, we used statisti-        grounded and classified into predefined semantic categories.
cal standard tests which allow to quantify the association        For example, Noll and Meinel [8] presented a study of the
between pragmatic characteristics of hashtag streams and          characteristics of tags and determined their usefulness for
their semantic categories. To tackle the second research          web page classification [9]. Similar to our work, Overell et
question, we firstly computed lexical features using a a stan-     al. [10] presented an approach which allows classifying tags
dard bag-of-words model with term frequency (TF). Then,           into semantic categories. They trained a classifier to classify
we trained several classification models with lexical features     Wikipedia articles into semantic categories, mapped Flickr
only, pragmatic features only and a combination of both. We       tags to Wikipedia articles using anchor texts in Wikipedia
compared the performance of different classification models         and finally classified Flickr tags into semantic categories by
by using standard evaluation measures such as the F1-score        using the previously trained classifier. Their results show
(which is defined as the harmonic mean of precision and            that their ClassTag system increases the coverage of the vo-
recall). To get a fair baseline for our classification mod-        cabulary by 115% compared to a simple WordNet approach
els, we constructed a control dataset by randomly shuffling         which classifies Flickr tags by mapping them to WordNet
the category labels of the hashtag streams. That means we         via string matching techniques. Unlike our work, they did
destroyed the original relationship between the pragmatic         not take into account how tags are used, but learn relations
properties and the semantic categories of hashtags.               between tags and semantic categories via mapping them to
   Our results show that pragmatic features indeed reveal         Wikipedia articles.
information about hashtags’ semantics and perform signifi-            Pragmatics and semantics of hashtags: On Twitter,
cantly better than the baseline. They can therefore be useful     users have developed a tagging culture by adding a hash
for the task of semantically annotating social media con-         symbol (#) in front of a short keyword. The first introduc-
tent. Not surprisingly, our results also show that lexical fea-   tion of the usage of hashtags was provided by Chris Messina
tures are more suitable than pragmatic features for the task      in a blog post [7]. Huang et al. [4] state that this kind
of semantically categorizing hashtag streams. However, an         of new tagging culture has created a completely new phe-
advantage of pragmatic features is that they are language-        nomenon, called micro-meme. The difference between such
and text-independent. Pragmatic features can be applied to        micro-memes and other social tagging systems is that the
tasks where the creation of lexical features is not possible –    participation in micro-memes is an a-priori approach, while
such as multimedia streams. Also for scenarios where tex-         other social tagging systems follow an a-posteriori approach.
tual content is available, pragmatic features allow for more      This is due to the fact that users are influenced by the ob-
flexibility due to their independence of the language used in      servation of the usage of micro-meme hashtags adopted by
the corpus. Our results are relevant for social media and         other users. The work of [4] suggests that hashtagging in
semantic web researchers who are interested in analyzing          Twitter is more commonly used to join public discussions
the semantics of hashtags in textual or non-textual social        than to organize content for future retrieval. The role of
streams (e.g., social video streams).                             hashtags has also been investigated in [18]. Their study
   This paper is structured as follows: Section 2 gives an        confirms that a hashtag serves both as a tag of content and
overview of related research on analyzing the semantics of        a symbol of community membership. Laniado and Mika [6]
tags in social bookmarking systems and research on hash-          explored to what extent hashtags can be used as strong iden-
tagging on Twitter in general. In Section 3 we describe           tifiers like URIs are used in the Semantic Web. They mea-
our experimental setup, including our datasets, feature en-       sured the quality of hashtags as identifiers for the Semantic
gineering and evaluation approach. Our results are reported       Web, defining several metrics to characterize hashtag usage
in Section 4 and further discussed in Section 5. Finally, we      on the dimensions of frequency, specificity, consistency, and
conclude our work in Section 6.                                   stability over time. Their results indicate that the lexical
                                                                  usage of hashtags can indeed be used to identify hashtags
2.   RELATED WORK                                                 which have the desirable properties of strong identifiers. Un-
                                                                  like our work, their work focuses on lexical usage patterns
   In the past, a considerable effort has been spent on study-
                                                                  and measures to what extent those patterns contribute to
ing the semantics of tags (e.g., tags in social bookmarking
                                                                  the differentiation between strong and weak semantic iden-
systems), but also hashtags in Twitter have received atten-
                                                                  tifiers (binary classification) while we use usage patterns to
tion from the research community.
                                                                  classify hashtags into semantic categories.
   Semantics of tags: On the one hand, researchers ex-
                                                                     Recently, researchers have also started to explore the dif-
plored to what extent semantics emerge from folksonomies
                                                                  fusion dynamics of hashtags - i.e., how hashtags spread in
by investigating different algorithms for extracting tag net-
                                                                  online communities. For example the work of [15] aims to
works and hierarchies from such systems (see e.g., [1], [3] or
                                                                  predict the exposure of a hashtag in a given time frame
[13]). The work of [14] evaluated three state-of-the-art folk-
                                                                  while [12] are interested in the temporal spreading patterns
sonomy induction algorithms in the context of five social
                                                                  of hashtags.
tagging systems. Their results show that those algorithms
specifically developed to capture intuitions of social tagging
systems outperform traditional hierarchical clustering tech-      3.   EXPERIMENTAL SETUP
niques. K¨rner et al. [5] investigated how tagging usage pat-
           o                                                        Our experiments are designed to explore to what extent
terns influence the quality of the emergent semantics. They        pragmatic properties of hashtag streams can be used to
found that ‘verbose’ taggers (describers) are more useful for     gauge the semantic category of a hashtag. We are not only
the emergence of tag semantics than users who use a small         interested in the idiosyncrasies of hashtag usage within one
set of tags (categorizers).                                       semantic category but also in the deltas between different
   On the other hand, researchers investigated to what extent     semantic categories. In this section, we first introduce our
tags (and the resources they annotate) can be semantically        dataset as well as the pragmatic and lexical measures which
3/4/2012                1 week             4/1/2012                              4/29/2012
we used to describe hashtag streams. Then we present the
                                                                                                       t0                                        t1                                     t2
methodology and evaluation approach which we used to an-
swer our research questions.                                                                                crawl of                              crawl of                               crawl of
                                                                                              stream                                    stream                                 stream
                                                                                                              social                                social                                 social
                                                                                              tweets                                    tweets                                 tweets
                                                                                                            structure                             structure                              structure


3.1   Dataset
   In this work we use data that we acquired from Twit-                               Figure 1: Timeline of the data collection process
ter’s API. Romero et al. [12] conducted a user study and a
classification experiment and identified eight broad semantic
                                                                                    categories (concretely, we removed the two hashtags #bsb
categories of hashtags: celebrity, games, idiom, movies/TV,
                                                                                    and #mj). We also decided to remove inactive hashtag
music, political, sports and technology. We used a list con-
                                                                                    streams (those where less than 300 posts where retrieved)
sisting of the 500 hashtags which were used by most users
                                                                                    as estimating information theoretic measures is problematic
within their dataset and which were manually assigned to
                                                                                    if only few observations are available [11]. The most common
the eight categories as a starting point for creating our own
                                                                                    solution is to restrict the measurements to situations where
dataset.
                                                                                    one has an adequate amount of data. We found four inactive
   For each category, we chose ten hashtags at random (see
                                                                                    hashtags in the category games and seven in the category
Table 1). We biased our random sample towards active
                                                                                    celebrity. The removal of these hashtag streams resulted in
hashtag streams by re-sampling hashtags for which we found
                                                                                    the complete removal of the category celebrity as it was only
less than 1000 posts at the beginning of our data collection
                                                                                    left with one hashtag stream (#michaeljackson). A possi-
(March 4th, 2012). For those categories for which we could
                                                                                    ble explanation for the low number of tweets in the hashtag
not find ten hashtags that had more than 1000 posts (i.e.,
                                                                                    streams for this category is that topics related to celebrities
games and celebrity), we selected the most active hashtags
                                                                                    have a shorter life-span than topics related to other cate-
per category (i.e., the hashtags for which we found the most
                                                                                    gories. Our final datasets consist of 64 hashtag streams and
posts).
                                                                                    seven semantic categories which were sufficiently active dur-
   The dataset consists of three parts, each part representing
                                                                                    ing our observation period.
a time frame of four weeks. The different time frames ensure
that we can observe the usage of a hashtag over a given
period of time. The time frames are independent of each                                   Table 2: Description of the complete dataset
other, i.e., the data collected at one time frame does not                                                                                                    t0                t1                   t2
contain any information of the data collected at another time
                                                                                      Tweets                                                   94,634                  94,984                    95,105
frame.                                                                                Authors                                                  53,593                  54,099                    53,750
   At the start of each time frame, we retrieved the most                             Followers                                            56,685,755              58,822,119                66,450,378
recent tweets in English for each hashtag using Twitter’s                             Followees                                            34,025,961              34,263,129                37,674,363
                                                                                      Friends                                              21,696,134              21,914,947                24,449,705
public search API. Afterwards, we retrieved the followers                             Mean Followers per Author                              1,057.71                1,087.31                  1,236.29
and followees of each user who had authored at least one                              Mean Followees per Author                                634.90                  633.34                    700.92
message in our hashtag stream dataset. Some pragmatic fea-                            Mean Friends per Author                                  404.83                  405.09                    454.88
tures capture information about who potentially consumes
a hashtag stream (followers) or who potentially informs au-
thors of a hashtag stream (followees) and therefore require                         3.2         Feature Engineering
the one-hop neighborhood of hashtag streams’ authors. In                              In the following, we define the pragmatic and lexical fea-
this work, we call users who hold both of these roles (i.e.,                        tures which we designed to capture the different social and
have established a bidirectional link with an author) friends.                      message based structures of hashtag streams. For our prag-
The starting dates of the time frames were March 4th (t0 ),                         matic features we further differentiate between static prag-
April 1st (t1 ) and April 29th, 2012 (t2 ). Table 2 depicts                         matic features (which capture the social structure of a hash-
the number of tweets and relations between users that we                            tag at a specific point in time) and dynamic pragmatic fea-
collected during each time frame.                                                   tures (which combine information from several time points).
   The stream tweets were retrieved on the first day of each
time frame, fetching tweets that were authored a maximum                             3.2.1             Static Pragmatic Measures:
of seven days previous to the date of retrieval. During the                            Entropy Measures are used to measure the random-
first week of each time frame, the user IDs of the followers                         ness of streams’ authors and their followers, followees and
and followees were collected. Figure 1 depicts this process.                        friends. For each hashtag stream, we rank the authors
   Since we were interested in learning what types of char-                         by the number of messages they published in that stream
acteristics are useful for describing a semantic hashtag cat-                       (norm entropy author) and we rank the followers (norm -
egory, we removed hashtag streams that belong to multiple                           entropy follower), followees (norm entropy followee) and


                     Table 1: Randomly selected hashtags per category (ordered alphabetically)
               technology                 idioms     sports      political          games                music                           celebrity                 movies
                blackberry         factaboutme             f1      climate               e3                 bsb                      ashleytisdale                   avatar
                       ebay         followfriday    football          gaza           games           eurovision                  brazilmissesdemi                     bbcqt
                  facebook         dontyouhate           golf   healthcare         gaming                lastfm                                bsb                    bones
                      flickr         iloveitwhen      nascar           iran     mafiawars           listeningto                    michaeljackson                     chuck
                     google                 iwish        nba         mmot     mobsterworld                   mj                                 mj                      glee
                    iphone             nevertrust        nhl          noh8            mw2                 music                               niley              glennbeck
                  microsoft             omgfacts      redsox        obama               ps3       musicmonday                                 regis                 movies
                 photoshop     oneofmyfollowers       soccer       politics     spymaster          nowplaying                          teamtaylor             supernatural
               socialmedia       rememberwhen         sports      teaparty     uncharted2             paramore                          tilatequila                       tv
                    twitter       wheniwaslittle    yankees         tehran             wow                 snsd                   weloveyoumiley                    xfactor
friends (norm entropy friend) by the number of stream’s          3.2.3    Lexical Measures:
authors they are related with. A high author entropy indi-          We use vector-based methods which allow representing
cates that the stream is created in a democratic way since all   each microblog message as a vector of terms and use term
authors contribute equally much. A high follower entropy         frequency (TF ) as weighting schema. In this work lexical
and friend entropy indicate that the followers and friends do    measures are always computed for individual time points
not focus their attention towards few authors but distribute     and are therefore static measures.
it equally across all authors. A high followee entropy and
friend entropy indicate that the authors do not focus their      3.3     Usage Patterns of Hashtag Categories
attention on a selected part of their audience.                     Our first aim is to investigate whether different seman-
   Overlap Measures describe the overlap between the au-         tic categories of hashtags reveal substantially different us-
thors and the followers (overlap authorfollower), followees      age patterns (such as that they are used and/or consumed
(overlap authorfollowee) or friends (overlap authorfriend) of    by different sets of users or that they are used for differ-
a hashtag stream. If overlap is one, all authors of a stream     ent purpose). To compare the pragmatic fingerprints of
are also followers, followees or friends of stream authors.      hashtags belonging to different semantic categories and to
This indicates that the stream is consumed and produced by       quantify the differences between categories, we conducted
the same users. A high overlap suggests that the community       a pairwise Mann-Whitney-Wilcoxon-Test which is a non-
around the hashtag is rather closed, while a low overlap indi-   parametric statistical hypothesis test for assessing whether
cates that the community is more open and that active and        one of two samples of independent observations tends to have
passive part of the community do not extensively overlap.        larger values than the other. We used a non-parametric test
   Coverage Measures characterize a hashtag stream via           since the Shapiro-Wilk-Test revealed that not all features
the nature of its messages. We introduce four coverage           are normally distributed, even after applying arcsine trans-
measures. The informational coverage measure (informa-           formation to ratio measures. Holm-Bonferroni method was
tional) indicates how many messages of a stream have an          used for adjusting the p-values and counteract the problem
informational purpose - i.e., contain a link. The conversa-      of multiple comparisons. For this experiment, we used the
tional coverage (conversational) measures the mean number        timeframes t0→1 and t1→2 .
of messages of a stream that have a conversational purpose
- i.e., those messages that are directed to one or several
                                                                                       t1→0                t2→1
specific users (e.g., through @replies). The retweet cover-
age (retweet) measures the percentage of messages which
are retweets. The hashtag coverage (hashtag) measures the
mean number of hashtags per message in a stream.
                                                                                t0                   t1               t2
3.2.2    Dynamic Pragmatic Measures:
    To explore how the social structure of a hashtag stream
changes over time, we measure the distance between the
tweet-frequency distributions of authors at different time                t0→1                 t1→2
points, and the author-frequency distributions of followers,
followees or friends at different time points. The intuition
behind these features is that certain semantic categories of           Figure 2: Illustration of our time frames
hashtags may have a fast changing social structure since
new people start and stop using those types of hashtags
frequently, while other semantic categories may have a more      3.4     Hashtag Classification
stable community around them which changes less over time.          Our second aim is to investigate to what extent pragmatic
    We use a symmetric variation of the Kullback-Leibler di-     and lexical properties of hashtag streams contribute to clas-
vergence (DKL ) which represents a natural distance mea-         sify them into their semantic categories. That means we
sure between two probability distributions (A and B) and         aim to classify temporal snapshots of hashtag streams into
                        2
                                        1
is defined as follows: 1 DKL (A||B) + 2 DKL (B||A). The KL        their correct semantic categories (to which they were as-
divergence is also known as relative entropy or information      signed in [12]) just by analyzing how they are used over
divergence. The KL divergence is zero if the two distri-         time. We then compare the performance of the pragmati-
butions are identical and approaches infinity as they differ       cally informed classifier with the performance of a classifier
more and more. We measure the KL divergence for the dis-         informed by lexical features within a semantic multiclass
tributions of authors (kl authors), followers (kl followers),    classification task. We used the timeframes t1→0 and t2→1
followees (kl followees) and friends (kl friends).               for this experiment in order to avoid including information
    Figure 2 visualizes the different time frames and their no-   from the ‘future’ in our classification.
tation. t0 only contains the static features computed from          We performed grid search with varying hyperparameters
data collected at t0 . Consequently, t1 and t2 only contain      using Support Vector Machine (linear and RBF kernels) and
the static features computed from data collected at t1 or        an ensemble method with extremely randomized trees. Since
t2 , respectively. t0→1 includes static features computed on     extremely randomized trees are a probabilistic method and
data collected at t0 and the dynamic measures computed           perform slightly different in each run, we run them ten times
on data collected at t0 and t1 . t1→0 includes static features   and report the average scores. The features were standard-
computed on data collected at t1 and the dynamic measures        ized by subtracting the mean and scaling to unit variance.
computed on data collected at t0 and t1 . t1→2 and t2→1 are      We used stratified 6-fold cross-validation (CV) to train and
defined in the same way.                                          test each classification model.
Since we have two different types of pragmatic features,             the 64 hashtag streams. That means we destroyed the orig-
static and dynamic ones, we trained and tested three sepa-             inal relationship between the pragmatic properties and the
rate classification models which were only informed by prag-            semantic categories of hashtags and evaluated the perfor-
matic information:                                                     mance of a classifier which tries to use pragmatic properties
   Static Pragmatic Model: We trained and tested this                  to classify hashtags into their shuffled categories within a 6-
classification model with static pragmatic features on data             fold cross-validation. We repeated the random shuffling 100
collected at t1 using stratified 6-fold CV. The experiment              times and used the resulting average F1-score as our base-
was repeated on the data collected at t2 .                             line performance. For the baseline classifier we also used grid
   Dynamic Pragmatic Model: We trained and tested                      search to determine the optimal parameters prior to train-
the classification model with dynamic pragmatic features on             ing. Our baseline classifier tests how well randomly assigned
data collected at t0 and t1 using stratified 6-fold CV. The             categories can be identified compared to our real semantic
computation of our dynamic features requires at least two              categories. One needs to note that a simple random guesser
time points. We repeated this experiment on data collected             baseline would be a weaker baseline than the one described
at t1 and t2 .                                                         above and would lead to a performance of 1/7.
   Combined Pragmatic Model: We combined the static                       To gain further insights into the impact of individual prop-
and dynamic pragmatic features, and trained and tested the             erties, we analyzed their information gain (IG) with respect
classification model on the data of t1→0 using stratified 6-             to the categories. The information gain measures how accu-
fold CV. Again, we repeated the experiment on the data of              rately a specific stream property P is able to predict stream’s
t2→1 .                                                                 category C and is defined as follows: IG(C, P ) =H(C) −
   We also performed our classification with a model using              H(C | P ) where H denotes the entropy.
our lexical features (i.e., TF weighted words). Finally, we
trained and tested a combined classification model using
pragmatic and lexical features, which leads to the follow-             4.     RESULTS
ing classification models:                                                In the following section, we present the results from our
   Lexical Model: We trained and tested the model on                   empirical study on usage patterns of different semantic cat-
data from t1 using stratified 6-fold CV and repeated the                egories of hashtags.
experiment on data collected at t2 .
   Combined Pragmatic and Lexical Model: We trained                    4.1      Usage Patterns of Hashtag Categories
and tested the mixed classifier on the data of t1→0 us-                    To answer our first research question, we explored to what
ing stratified 6-fold CV, then repeated this experiment for             extent usage patterns of hashtag streams in different seman-
t2→1 . A simple concatenation of pragmatic and lexical fea-            tic categories are indeed significantly different.
tures is not useful, since the vast amount of lexical features            Our results indicate that some pragmatic measures are
would overrule the pragmatic features. Therefore, we used              indeed significantly different for distinct semantic categories.
a stacking method (see [2]) and performed firstly a clas-               This indicates that hashtags of certain categories are used
sification using lexical features alone and Leave-One-Out               in a very specific way which may allow us to relate these
cross-validation. We used a SVM with linear kernel for                 hashtags with their semantic categories just by observing
this classification since it worked best for these features.            how users use them. Table 3 depicts the measures that show
Secondly, we combined the pragmatic features with the re-              statistically significant (p < 0.05) differences in both t0→1
sulting seven probability features which we got from the               and t1→2 . In total, 35 pragmatic category differences were
previous classification model and which describe how likely             found to be statistically significant (with p < 0.05) for t0→1
each semantic class is for a certain stream given its words.           and 33 for t1→2 . 26 pragmatic category differences were
   To get a fair baseline for our experiment, we constructed a         found to be significant for both t0→1 and t1→2 which suggests
control dataset by randomly shuffling the category labels of             that results are independent of our choice of time frame.


Table 3: Features which showed a statistically significant difference (with p < 0.05) for each pair of categories
in both t0→1 and t1→2
                              games           idioms          movies         music        political                sports
                idioms        informational
                              retweet
                movies                        informational
                music                         informational
                political     kl followers    kl authors
                                              kl followers
                                              kl followees
                                              informational
                                              hashtag
                sports        kl followers    kl authors
                                              kl followers
                                              informational
                technology    kl followers    kl authors      kl friends     kl friends   overlap authorfollower
                                              kl followers                                overlap authorfriend
                                              kl followees
                                              kl friends
                                              informational
                                              retweet
                                              hashtag
technology                                                                  technology


                       sports                                                                      sports


                      political                                                                   political


                       music                                                                       music


                      movies                                                                      movies


                       idioms                                                                      idioms


                      games                                                                       games



                                  0.0       0.2         0.4       0.6          0.8       1.0                      6         8         10          12       14

                                                  Informational Coverage                                                  KL Divergence Authors



                   (a) This figure shows the percentage of mes- (b) This figure shows how much the au-
                   sages of hashtag streams belonging to differ- thors’ tweet-frequency distributions of hash-
                   ent categories that contain at least one link. tag streams of different categories change on
                                                                  average.




                   technology                                                                  technology


                       sports                                                                      sports


                      political                                                                   political


                       music                                                                       music


                      movies                                                                      movies


                       idioms                                                                      idioms


                      games                                                                       games



                                        1           2         3            4         5                        1       2          3         4           5        6

                                                  KL Divergence Followers                                                 KL Divergence Friends



                   (c) This figure shows how much the follow-                                   (d) This figure shows how much the friends’
                   ers’ author-frequency distributions of hash-                                author-frequency distributions of hashtag
                   tag streams of different categories change on                                streams of different categories change on av-
                   average.                                                                    erage.


Figure 3: Each plot shows the feature distribution of different categories of one of the 4 best pragmatic
features for t0→1 . We obtained similar results for t1→2 .


   Not surprisingly, the category which shows the most spe-                                          The category technology can be distinguished from all
cific usage patterns is idioms and therefore the hashtags of                                       other categories except sports, particularly because its fol-
this category can be distinguished from all hashtags just by                                      lowers’ and friends’ author-frequency distributions have
analyzing their pragmatic properties. Hashtag streams of                                          significantly lower symmetric KL divergences than hash-
the category idioms exhibit a significantly lower informa-                                         tags in the categories games, idioms, movies and music
tional coverage than hashtag streams of all other categories                                      (see Figures 3(c) and 3(d)). This indicates that hashtag
(see Figure 3(a)) and a significantly higher symmetric KL di-                                      streams in the category technology have a stable social
vergence for author’s tweet-frequency distributions (see Fig-                                     structure which changes less over time. This is not sur-
ure 3(b)). Also the followers’ and friends’ author-frequency                                      prising since this semantic category denotes a topical area
distributions tend to have a higher symmetric KL divergence                                       and users who are interested in such areas may consume
for idioms hashtags than for other hashtags (see Figures 3(c)                                     and provide information on a regular base. It is especially
and 3(d)). This indicates that the social structure of hashtag                                    interesting to note that the only pragmatic measures which
streams in the category idioms changes faster than hashtags                                       allows distinguishing political and technological hashtag
of other categories. Furthermore, hashtag streams of this                                         streams are the author-follower and author-friend over-
category are less informative - i.e., contain significantly less                                   laps since these overlaps are significantly lower for the
links per message on average.                                                                     category technology compared to the category political.
1.0




                                                                                                                                                                       1.0
                                                                     Baseline performance                               Model performance                                                                  Baseline performance                               Model performance




                                 0.8




                                                                                                                                                                       0.8
                    F1 measure




                                                                                                                                                          F1 measure
                                 0.6




                                                                                                                                                                       0.6
                                 0.4




                                                                                                                                                                       0.4
                                 0.2




                                                                                                                                                                       0.2
                                 0.0




                                                                                                                                                                       0.0
                                       Comb. Pragmatic and Lexical




                                                                           Combined Pragmatic




                                                                                                    Dynamic Pragmatic




                                                                                                                            Lexical




                                                                                                                                       Static Pragmatic




                                                                                                                                                                             Comb. Pragmatic and Lexical




                                                                                                                                                                                                                 Combined Pragmatic




                                                                                                                                                                                                                                          Dynamic Pragmatic




                                                                                                                                                                                                                                                                  Lexical




                                                                                                                                                                                                                                                                             Static Pragmatic
                                                                                                (a) t1→0                                                                                                                              (b) t2→1


Figure 4: Weighted averaged F1-scores of different classification models trained and tested on t1→0 4(a) and
t2→1 4(b) using 6-fold cross-validation


This indicates that the content of hashtag streams of the
category political is far more likely to be produced and con-                                                                                                          Table 4: Top features for two different datasets
sumed by the same people than content of technological                                                                                                                 ranked via Information Gain
                                                                                                                                                                                                             Rank                                                           t1→0                        t2→1
hashtag streams.
   Comparing the individual measures reveals that the infor-                                                                                                                                                                     1             informational                                      kl followers
                                                                                                                                                                                                                                 2               kl followers                                   informational
mational coverage (six category pairs) and the symmetric                                                                                                                                                                         3                 kl friends                                         hashtag
KL divergences of followers’ author-frequency distributions                                                                                                                                                                      4                   hashtag                                      kl followees
(six category pairs), authors’ tweet-frequency distributions                                                                                                                                                                     5       norm entropy friend                                        kl friends
(three pairs) and friends’ follower-frequency distributions
(three pairs) are the most discriminative measures. Figure 3
depicts the distributions of these four measures per category.                                                                                                         4.2.1                               Feature Ranking:
Other measures that show significant differences in medians                                                                                                                 In addition to the overall classification performance which
for both t0→1 and t1→2 are the symmetric KL divergence                                                                                                                 can be achieved solely based on analyzing the pragmatics of
of followees’ author-frequency distributions (two pairs), the                                                                                                          hashtags, we were also interested in gaining insights into the
author-follower and the author-friend overlap (one pair) as                                                                                                            impact of individual pragmatic features. To evaluate the in-
well as the retweet and hashtag coverage (two pairs).                                                                                                                  dividual performance of the features we used information
   Some measures like the conversational coverage measure                                                                                                              gain (with respect to the categories) as a ranking criterion.
did not show any significant differences for any of the cat-                                                                                                             The ranking was performed on t1→0 and t2→1 with stratified
egory pairs, for any time frame. This indicates that in all                                                                                                            6-fold cross-validation. Table 4 shows that the top five fea-
hashtag streams an equal amount of conversational activities                                                                                                           tures (i.e., the pragmatic features which reveal most about
take place.                                                                                                                                                            the semantic of hashtags) are features which capture the
                                                                                                                                                                       temporal dynamics of the social context of a hashtag (i.e.,
4.2    Hashtag Classification                                                                                                                                           the temporal follower, followees and friends dynamics) as
  In order to quantify the value of different pragmatic and                                                                                                             well as the informational and hashtag coverage. This indi-
lexical properties of hashtag streams for predicting their se-                                                                                                         cates that the collective purpose for which a hashtag is used
mantic category, we conducted a hashtag stream classifi-                                                                                                                (i.e., if it used to share information rather than for other
cation experiment and systematically compared the perfor-                                                                                                              purposes) and the social dynamics around a hashtags – i.e.,
mance of various classification models trained with different                                                                                                            who uses a hashtag for whom – play a key role in under-
sets of features.                                                                                                                                                      standing its semantics.
  Figure 4 shows the performance of the best classifier (ex-
tremely randomized trees) trained with different sets of fea-
tures. One can see from this figure that in general lexical                                                                                                             5.    DISCUSSION OF RESULTS
features perform better than pragmatic features, but also                                                                                                                 Although our results show that lexical features work best
that pragmatic features (both static and dynamic) signifi-                                                                                                              within the semantic classification task, those features are
cantly outperform a random baseline. This indicates that                                                                                                               text and language dependent. Therefore, their applicability
pragmatic features indeed reveal information about a hash-                                                                                                             is limited to settings where text is available. Pragmatic fea-
tag’s meaning, even though they do not match the perfor-                                                                                                               tures on the other hand rely on usage information which is
mance of lexical features in this case. In 4(a) we can see that                                                                                                        independent of the type of content which is shared in social
for t1→0 the combination of lexical and pragmatic features                                                                                                             streams and can therefore also be computed for social video
performs slightly better than using lexical features alone.                                                                                                            or image streams.
We believe that pragmatic features can supplement lex-                Proceedings of the 19th International World Wide
ical features if lexical features alone are not sufficient. In             Web Conference (WWW 2010), Raleigh, NC, USA,
our experiments, we could see that the performance may                   Apr. 2010. ACM.
slightly increase when combining pragmatic and lexical fea-        [6]   D. Laniado and P. Mika. Making sense of twitter. In
tures. However, the effect was not significant. We think                   P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika,
the reason for this is that in our setup lexical features alone          L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm,
already achieved good performance.                                       editors, International Semantic Web Conference (1),
   The classification results coincide with the results of the            volume 6496 of Lecture Notes in Computer Science,
statistical significance tests. Ranking the properties by in-             pages 470–485. Springer, 2010.
formation gain showed that the most discriminative proper-         [7]   C. Messina. Groups for twitter; or a proposal for
ties (the ones that showed a statistical significance in both             twitter tag channels. http://factoryjoe.com/blog/
t0→1 and t1→2 for the highest amount of category pairs)                  2007/08/25/groups-for-twitter-or-a-proposal-
found in 4.1 were also the top ranked features (informational            for-twitter-tag-channels/, 2007.
coverage and the KL divergences).                                  [8]   M. G. Noll and C. Meinel. Exploring social
                                                                         annotations for web document classification. In
6.   CONCLUSIONS AND IMPLICATIONS                                        Proceedings of the 2008 ACM symposium on Applied
  Our work suggests that the collective usage of hashtags                computing, SAC ’08, pages 2315–2320, New York, NY,
indeed reveals information about their semantics. How-                   USA, 2008. ACM.
ever, further research is required to explore the relations        [9]   M. G. Noll and C. Meinel. The metadata triumvirate:
between usage information and semantics, especially in do-               Social annotations, anchor texts and search queries. In
mains where limited text is available. We hope that our                  Web Intelligence, pages 640–647. IEEE, 2008.
research is a first step into this direction since it shows that   [10]   S. Overell, B. Sigurbj¨rnsson, and R. van Zwol.
                                                                                               o
hashtags of different semantic categories are indeed used in              Classifying tags using open content resources. In
different ways.                                                           Proceedings of the Second ACM International
  Our work has implications for researchers and practition-              Conference on Web Search and Data Mining, WSDM
ers interested in investigating the semantics of social media            ’09, pages 64–73, New York, NY, USA, 2009. ACM.
content. Social media applications such as Twitter provide        [11]   S. Panzeri and A. Treves. Analytical estimates of
a huge amount of textual information. Beside the textual                 limited sampling biases in different information
information, also usage information can be obtained from                 measures. Network: Computation in Neural Systems,
these platforms and our work shows how this information                  7(1):87–107, 1996.
can be exploited for assigning semantic annotations to tex-       [12]   D. M. Romero, B. Meeder, and J. Kleinberg.
tual data streams.                                                       Differences in the mechanics of information diffusion
                                                                         across topics: idioms, political hashtags, and complex
7.   ACKNOWLEDGEMENTS                                                    contagion on twitter. In Proceedings of the 20th
                                                                         international conference on World wide web, WWW
   This work was supported in part by a DOC-fForte fellow-               ’11, pages 695–704, New York, NY, USA, 2011. ACM.
ship of the Austrian Academy of Science to Claudia Wagner.
                                                                  [13]   P. Schmitz. Inducing ontology from flickr tags. In
Furthermore, this work is in part funded by the FWF Aus-
                                                                         Proceedings of the Workshop on Collaborative Tagging
trian Science Fund Grant I677 and the Know-Center Graz.
                                                                         at WWW2006, Edinburgh, Scotland, May 2006.
                                                                  [14]   M. Strohmaier, D. Helic, D. Benz, C. K¨rner, and
                                                                                                                  o
8.   REFERENCES                                                          R. Kern. Evaluation of folksonomy induction
 [1] D. Benz, C. K¨rner, A. Hotho, G. Stumme, and
                    o                                                    algorithms. Transactions on Intelligent Systems and
     M. Strohmaier. One tag to bind them all: Measuring                  Technology, 2012.
     term abstractness in social metadata. In Proc. of 8th        [15]   O. Tsur and A. Rappoport. What’s in a hashtag?:
     Extended Semantic Web Conference ESWC 2011,                         content based prediction of the spread of ideas in
     Heraklion, Crete, (May 2011), 2011.                                 microblogging communities. In Proceedings of the fifth
 [2] T. Hastie, R. Tibshirani, and J. Friedman. The                      ACM international conference on Web search and
     Elements of Statistical Learning. Springer Series in                data mining, WSDM ’12, pages 643–652, New York,
     Statistics. Springer New York Inc., New York, NY,                   NY, USA, 2012. ACM.
     USA, 2001.                                                   [16]   C. Wagner and M. Strohmaier. The wisdom in
 [3] P. Heymann and H. Garcia-Molina. Collaborative                      tweetonomies: Acquiring latent conceptual structures
     creation of communal hierarchical taxonomies in social              from social awareness streams. In Semantic Search
     tagging systems. Technical Report 2006-10, Computer                 Workshop at WWW2010, 2010.
     Science Department, April 2006.                              [17]   L. Wittgenstein. Philosophical Investigations.
 [4] J. Huang, K. M. Thornton, and E. N. Efthimiadis.                    Blackwell Publishers, London, United Kingdom, 1953.
     Conversational tagging in twitter. In Proceedings of                Republished 2001.
     the 21st ACM conference on Hypertext and                     [18]   L. Yang, T. Sun, M. Zhang, and Q. Mei. We know
     hypermedia, HT ’10, pages 173–178, New York, NY,                    what @you #tag: does the dual role affect hashtag
     USA, 2010. ACM.                                                     adoption? In Proceedings of the 21st international
 [5] C. K¨rner, D. Benz, M. Strohmaier, A. Hotho, and
          o                                                              conference on World Wide Web, WWW ’12, pages
     G. Stumme. Stop thinking, start tagging - tag                       261–270, New York, NY, USA, 2012. ACM.
     semantics emerge from collaborative verbosity. In

Contenu connexe

Tendances

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Using hash tag graph based topic model to connect semantically-related words ...
Using hash tag graph based topic model to connect semantically-related words ...Using hash tag graph based topic model to connect semantically-related words ...
Using hash tag graph based topic model to connect semantically-related words ...Shakas Technologies
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionLuca Nannini
 
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering TechniquesA Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniquestengyue5i5j
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441IJRAT
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...redpel dot com
 
Discovering latent informaion by
Discovering latent informaion byDiscovering latent informaion by
Discovering latent informaion byijaia
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...IJCSIS Research Publications
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelIJERA Editor
 
Discovering emerging topics in social streams via link anomaly detection
Discovering emerging topics in social streams via link anomaly detectionDiscovering emerging topics in social streams via link anomaly detection
Discovering emerging topics in social streams via link anomaly detectionFinalyear Projects
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
 
Multiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand AwarenessMultiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand AwarenessTELKOMNIKA JOURNAL
 

Tendances (14)

Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Using hash tag graph based topic model to connect semantically-related words ...
Using hash tag graph based topic model to connect semantically-related words ...Using hash tag graph based topic model to connect semantically-related words ...
Using hash tag graph based topic model to connect semantically-related words ...
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an IntroductionModeling Causal Reasoning in Complex Networks through NLP: an Introduction
Modeling Causal Reasoning in Complex Networks through NLP: an Introduction
 
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering TechniquesA Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...Top k query processing and malicious node identification based on node groupi...
Top k query processing and malicious node identification based on node groupi...
 
Discovering latent informaion by
Discovering latent informaion byDiscovering latent informaion by
Discovering latent informaion by
 
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
Improving Intrusion Detection with Deep Packet Inspection and Regular Express...
 
Evolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability ModelEvolving Swings (topics) from Social Streams using Probability Model
Evolving Swings (topics) from Social Streams using Probability Model
 
Discovering emerging topics in social streams via link anomaly detection
Discovering emerging topics in social streams via link anomaly detectionDiscovering emerging topics in social streams via link anomaly detection
Discovering emerging topics in social streams via link anomaly detection
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
 
Multiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand AwarenessMultiple Regression to Analyse Social Graph of Brand Awareness
Multiple Regression to Analyse Social Graph of Brand Awareness
 

Similaire à Predicting hashtag categories using usage patterns

14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdfMehwishKanwal14
 
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...Nexgen Technology
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
 
03. revised paper edit iq
03. revised paper edit iq03. revised paper edit iq
03. revised paper edit iqIAESIJEECS
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...IJSCAI Journal
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02IJwest
 
int.ere.st: SCOT-based Tag Sharing Services
int.ere.st: SCOT-based Tag Sharing Servicesint.ere.st: SCOT-based Tag Sharing Services
int.ere.st: SCOT-based Tag Sharing ServicesHaklae Kim
 

Similaire à Predicting hashtag categories using usage patterns (20)

14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf14420-Article Text-17938-1-2-20201228.pdf
14420-Article Text-17938-1-2-20201228.pdf
 
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
USING HASHTAG GRAPH-BASED TOPIC MODEL TO CONNECT SEMANTICALLY-RELATED WORDS W...
 
Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect Information
 
03. revised paper edit iq
03. revised paper edit iq03. revised paper edit iq
03. revised paper edit iq
 
Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems  Semantic Grounding Strategies for Tagbased Recommender Systems
Semantic Grounding Strategies for Tagbased Recommender Systems
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGRAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSING
 
Improving Tag Clouds
Improving Tag CloudsImproving Tag Clouds
Improving Tag Clouds
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02
 
int.ere.st: SCOT-based Tag Sharing Services
int.ere.st: SCOT-based Tag Sharing Servicesint.ere.st: SCOT-based Tag Sharing Services
int.ere.st: SCOT-based Tag Sharing Services
 

Plus de Gabriela Agustini

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalGabriela Agustini
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisGabriela Agustini
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e EducaçãoGabriela Agustini
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gilGabriela Agustini
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologyGabriela Agustini
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?Gabriela Agustini
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good ReportGabriela Agustini
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Gabriela Agustini
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucionalGabriela Agustini
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Gabriela Agustini
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Gabriela Agustini
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGabriela Agustini
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Gabriela Agustini
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoGabriela Agustini
 

Plus de Gabriela Agustini (20)

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
 
Inovação digital
Inovação digital Inovação digital
Inovação digital
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
 
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
 
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
 
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
 
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
 
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Predicting hashtag categories using usage patterns

  • 1. Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter Lisa Posch Claudia Wagner Graz University of Technology, JOANNEUM RESEARCH, Knowledge Management Institute for Information and Institute Communication Technologies Inffeldgasse 13, 8010 Graz, Steyrergasse 17, 8010 Graz, Austria Austria lposch@sbox.tugraz.at clauwa@sbox.tugraz.at Philipp Singer Markus Strohmaier Graz University of Technology, Graz University of Technology, Knowledge Management Knowledge Management Institute Institute Inffeldgasse 13, 8010 Graz, Inffeldgasse 13, 8010 Graz, Austria Austria philipp.singer@tugraz.at markus.strohmaier@tugraz.at ABSTRACT Thus, new methods and techniques for analyzing the se- This paper sets out to explore whether data about the us- mantics of hashtags are definitely needed. age of hashtags on Twitter contains information about their A simplistic view on Wittgenstein’s work [17] suggests semantics. Towards that end, we perform initial statisti- that meaning is use. This indicates that the meaning of cal hypothesis tests to quantify the association between us- a word is not defined by a reference to the object it denotes, age patterns and semantics of hashtags. To assess the util- but by the variety of uses to which the word is put. There- ity of pragmatic features – which describe how a hashtag fore, one can use the narrow, lexical context of a word (i.e., is used over time – for semantic analysis of hashtags, we its co-occurring words) to approximate its meaning. Our conduct various hashtag stream classification experiments work builds on this observation, but focuses on the prag- and compare their utility with the utility of lexical features. matics of a word (i.e., how a word, or in our case a hashtag, Our results indicate that pragmatic features indeed contain is used by a large group of users) – rather than its narrow, valuable information for classifying hashtags into semantic lexical context. categories. Although pragmatic features do not outperform The aim of this work is to investigate to what extent prag- lexical features in our experiments, we argue that pragmatic matic characteristics of a hashtag (which capture how a large features are important and relevant for settings in which tex- group of users uses a hashtag) may reveal information about tual information might be sparse or absent (e.g., in social its semantics. Specifically, our work addresses the following video streams). research questions: • Do different semantic categories of hashtags reveal sub- Categories and Subject Descriptors stantially different usage patterns? H.3.5 [Information Storage and Retrieval]: Online In- • To what extent do pragmatic and lexical properties of formation Services—Web-based services hashtags help to predict the semantic category of a hashtag? Keywords To address these research questions we conducted an em- Twitter; hashtags; social structure; semantics pirical study on a broad range of diverse hashtag streams be- longing to eight different semantic categories (such as tech- nology, sports or idioms) which have been identified in pre- 1. INTRODUCTION vious research [12] and have shown to be useful for grouping A hashtag is a string of characters preceded by the hash hashtags. From each of the eight categories, we selected ten (#) character and it is used on platforms like Twitter as sample hashtags at random and collected temporal snap- descriptive label or to build communities around particular shots of messages containing at least one of these hashtags topics [15]. To outside observers, the meaning of hashtags at three different points in time. To quantify how hashtags is usually difficult to analyze, as they consist of short, often are used over time, we extended the set of pragmatic stream abbreviated or concatenated concepts (e.g., #MSM2013). measures which we introduced in our previous work [16] and applied them to the hashtag streams in our dataset. These Copyright is held by the International World Wide Web Conference pragmatic measures capture not only the social structure of Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. a hashtag at specific points in time, but also the changes in WWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. social structure over time. ACM 978-1-4503-2038-2/13/05.
  • 2. To answer the first research question, we used statisti- grounded and classified into predefined semantic categories. cal standard tests which allow to quantify the association For example, Noll and Meinel [8] presented a study of the between pragmatic characteristics of hashtag streams and characteristics of tags and determined their usefulness for their semantic categories. To tackle the second research web page classification [9]. Similar to our work, Overell et question, we firstly computed lexical features using a a stan- al. [10] presented an approach which allows classifying tags dard bag-of-words model with term frequency (TF). Then, into semantic categories. They trained a classifier to classify we trained several classification models with lexical features Wikipedia articles into semantic categories, mapped Flickr only, pragmatic features only and a combination of both. We tags to Wikipedia articles using anchor texts in Wikipedia compared the performance of different classification models and finally classified Flickr tags into semantic categories by by using standard evaluation measures such as the F1-score using the previously trained classifier. Their results show (which is defined as the harmonic mean of precision and that their ClassTag system increases the coverage of the vo- recall). To get a fair baseline for our classification mod- cabulary by 115% compared to a simple WordNet approach els, we constructed a control dataset by randomly shuffling which classifies Flickr tags by mapping them to WordNet the category labels of the hashtag streams. That means we via string matching techniques. Unlike our work, they did destroyed the original relationship between the pragmatic not take into account how tags are used, but learn relations properties and the semantic categories of hashtags. between tags and semantic categories via mapping them to Our results show that pragmatic features indeed reveal Wikipedia articles. information about hashtags’ semantics and perform signifi- Pragmatics and semantics of hashtags: On Twitter, cantly better than the baseline. They can therefore be useful users have developed a tagging culture by adding a hash for the task of semantically annotating social media con- symbol (#) in front of a short keyword. The first introduc- tent. Not surprisingly, our results also show that lexical fea- tion of the usage of hashtags was provided by Chris Messina tures are more suitable than pragmatic features for the task in a blog post [7]. Huang et al. [4] state that this kind of semantically categorizing hashtag streams. However, an of new tagging culture has created a completely new phe- advantage of pragmatic features is that they are language- nomenon, called micro-meme. The difference between such and text-independent. Pragmatic features can be applied to micro-memes and other social tagging systems is that the tasks where the creation of lexical features is not possible – participation in micro-memes is an a-priori approach, while such as multimedia streams. Also for scenarios where tex- other social tagging systems follow an a-posteriori approach. tual content is available, pragmatic features allow for more This is due to the fact that users are influenced by the ob- flexibility due to their independence of the language used in servation of the usage of micro-meme hashtags adopted by the corpus. Our results are relevant for social media and other users. The work of [4] suggests that hashtagging in semantic web researchers who are interested in analyzing Twitter is more commonly used to join public discussions the semantics of hashtags in textual or non-textual social than to organize content for future retrieval. The role of streams (e.g., social video streams). hashtags has also been investigated in [18]. Their study This paper is structured as follows: Section 2 gives an confirms that a hashtag serves both as a tag of content and overview of related research on analyzing the semantics of a symbol of community membership. Laniado and Mika [6] tags in social bookmarking systems and research on hash- explored to what extent hashtags can be used as strong iden- tagging on Twitter in general. In Section 3 we describe tifiers like URIs are used in the Semantic Web. They mea- our experimental setup, including our datasets, feature en- sured the quality of hashtags as identifiers for the Semantic gineering and evaluation approach. Our results are reported Web, defining several metrics to characterize hashtag usage in Section 4 and further discussed in Section 5. Finally, we on the dimensions of frequency, specificity, consistency, and conclude our work in Section 6. stability over time. Their results indicate that the lexical usage of hashtags can indeed be used to identify hashtags 2. RELATED WORK which have the desirable properties of strong identifiers. Un- like our work, their work focuses on lexical usage patterns In the past, a considerable effort has been spent on study- and measures to what extent those patterns contribute to ing the semantics of tags (e.g., tags in social bookmarking the differentiation between strong and weak semantic iden- systems), but also hashtags in Twitter have received atten- tifiers (binary classification) while we use usage patterns to tion from the research community. classify hashtags into semantic categories. Semantics of tags: On the one hand, researchers ex- Recently, researchers have also started to explore the dif- plored to what extent semantics emerge from folksonomies fusion dynamics of hashtags - i.e., how hashtags spread in by investigating different algorithms for extracting tag net- online communities. For example the work of [15] aims to works and hierarchies from such systems (see e.g., [1], [3] or predict the exposure of a hashtag in a given time frame [13]). The work of [14] evaluated three state-of-the-art folk- while [12] are interested in the temporal spreading patterns sonomy induction algorithms in the context of five social of hashtags. tagging systems. Their results show that those algorithms specifically developed to capture intuitions of social tagging systems outperform traditional hierarchical clustering tech- 3. EXPERIMENTAL SETUP niques. K¨rner et al. [5] investigated how tagging usage pat- o Our experiments are designed to explore to what extent terns influence the quality of the emergent semantics. They pragmatic properties of hashtag streams can be used to found that ‘verbose’ taggers (describers) are more useful for gauge the semantic category of a hashtag. We are not only the emergence of tag semantics than users who use a small interested in the idiosyncrasies of hashtag usage within one set of tags (categorizers). semantic category but also in the deltas between different On the other hand, researchers investigated to what extent semantic categories. In this section, we first introduce our tags (and the resources they annotate) can be semantically dataset as well as the pragmatic and lexical measures which
  • 3. 3/4/2012 1 week 4/1/2012 4/29/2012 we used to describe hashtag streams. Then we present the t0 t1 t2 methodology and evaluation approach which we used to an- swer our research questions. crawl of crawl of crawl of stream stream stream social social social tweets tweets tweets structure structure structure 3.1 Dataset In this work we use data that we acquired from Twit- Figure 1: Timeline of the data collection process ter’s API. Romero et al. [12] conducted a user study and a classification experiment and identified eight broad semantic categories (concretely, we removed the two hashtags #bsb categories of hashtags: celebrity, games, idiom, movies/TV, and #mj). We also decided to remove inactive hashtag music, political, sports and technology. We used a list con- streams (those where less than 300 posts where retrieved) sisting of the 500 hashtags which were used by most users as estimating information theoretic measures is problematic within their dataset and which were manually assigned to if only few observations are available [11]. The most common the eight categories as a starting point for creating our own solution is to restrict the measurements to situations where dataset. one has an adequate amount of data. We found four inactive For each category, we chose ten hashtags at random (see hashtags in the category games and seven in the category Table 1). We biased our random sample towards active celebrity. The removal of these hashtag streams resulted in hashtag streams by re-sampling hashtags for which we found the complete removal of the category celebrity as it was only less than 1000 posts at the beginning of our data collection left with one hashtag stream (#michaeljackson). A possi- (March 4th, 2012). For those categories for which we could ble explanation for the low number of tweets in the hashtag not find ten hashtags that had more than 1000 posts (i.e., streams for this category is that topics related to celebrities games and celebrity), we selected the most active hashtags have a shorter life-span than topics related to other cate- per category (i.e., the hashtags for which we found the most gories. Our final datasets consist of 64 hashtag streams and posts). seven semantic categories which were sufficiently active dur- The dataset consists of three parts, each part representing ing our observation period. a time frame of four weeks. The different time frames ensure that we can observe the usage of a hashtag over a given period of time. The time frames are independent of each Table 2: Description of the complete dataset other, i.e., the data collected at one time frame does not t0 t1 t2 contain any information of the data collected at another time Tweets 94,634 94,984 95,105 frame. Authors 53,593 54,099 53,750 At the start of each time frame, we retrieved the most Followers 56,685,755 58,822,119 66,450,378 recent tweets in English for each hashtag using Twitter’s Followees 34,025,961 34,263,129 37,674,363 Friends 21,696,134 21,914,947 24,449,705 public search API. Afterwards, we retrieved the followers Mean Followers per Author 1,057.71 1,087.31 1,236.29 and followees of each user who had authored at least one Mean Followees per Author 634.90 633.34 700.92 message in our hashtag stream dataset. Some pragmatic fea- Mean Friends per Author 404.83 405.09 454.88 tures capture information about who potentially consumes a hashtag stream (followers) or who potentially informs au- thors of a hashtag stream (followees) and therefore require 3.2 Feature Engineering the one-hop neighborhood of hashtag streams’ authors. In In the following, we define the pragmatic and lexical fea- this work, we call users who hold both of these roles (i.e., tures which we designed to capture the different social and have established a bidirectional link with an author) friends. message based structures of hashtag streams. For our prag- The starting dates of the time frames were March 4th (t0 ), matic features we further differentiate between static prag- April 1st (t1 ) and April 29th, 2012 (t2 ). Table 2 depicts matic features (which capture the social structure of a hash- the number of tweets and relations between users that we tag at a specific point in time) and dynamic pragmatic fea- collected during each time frame. tures (which combine information from several time points). The stream tweets were retrieved on the first day of each time frame, fetching tweets that were authored a maximum 3.2.1 Static Pragmatic Measures: of seven days previous to the date of retrieval. During the Entropy Measures are used to measure the random- first week of each time frame, the user IDs of the followers ness of streams’ authors and their followers, followees and and followees were collected. Figure 1 depicts this process. friends. For each hashtag stream, we rank the authors Since we were interested in learning what types of char- by the number of messages they published in that stream acteristics are useful for describing a semantic hashtag cat- (norm entropy author) and we rank the followers (norm - egory, we removed hashtag streams that belong to multiple entropy follower), followees (norm entropy followee) and Table 1: Randomly selected hashtags per category (ordered alphabetically) technology idioms sports political games music celebrity movies blackberry factaboutme f1 climate e3 bsb ashleytisdale avatar ebay followfriday football gaza games eurovision brazilmissesdemi bbcqt facebook dontyouhate golf healthcare gaming lastfm bsb bones flickr iloveitwhen nascar iran mafiawars listeningto michaeljackson chuck google iwish nba mmot mobsterworld mj mj glee iphone nevertrust nhl noh8 mw2 music niley glennbeck microsoft omgfacts redsox obama ps3 musicmonday regis movies photoshop oneofmyfollowers soccer politics spymaster nowplaying teamtaylor supernatural socialmedia rememberwhen sports teaparty uncharted2 paramore tilatequila tv twitter wheniwaslittle yankees tehran wow snsd weloveyoumiley xfactor
  • 4. friends (norm entropy friend) by the number of stream’s 3.2.3 Lexical Measures: authors they are related with. A high author entropy indi- We use vector-based methods which allow representing cates that the stream is created in a democratic way since all each microblog message as a vector of terms and use term authors contribute equally much. A high follower entropy frequency (TF ) as weighting schema. In this work lexical and friend entropy indicate that the followers and friends do measures are always computed for individual time points not focus their attention towards few authors but distribute and are therefore static measures. it equally across all authors. A high followee entropy and friend entropy indicate that the authors do not focus their 3.3 Usage Patterns of Hashtag Categories attention on a selected part of their audience. Our first aim is to investigate whether different seman- Overlap Measures describe the overlap between the au- tic categories of hashtags reveal substantially different us- thors and the followers (overlap authorfollower), followees age patterns (such as that they are used and/or consumed (overlap authorfollowee) or friends (overlap authorfriend) of by different sets of users or that they are used for differ- a hashtag stream. If overlap is one, all authors of a stream ent purpose). To compare the pragmatic fingerprints of are also followers, followees or friends of stream authors. hashtags belonging to different semantic categories and to This indicates that the stream is consumed and produced by quantify the differences between categories, we conducted the same users. A high overlap suggests that the community a pairwise Mann-Whitney-Wilcoxon-Test which is a non- around the hashtag is rather closed, while a low overlap indi- parametric statistical hypothesis test for assessing whether cates that the community is more open and that active and one of two samples of independent observations tends to have passive part of the community do not extensively overlap. larger values than the other. We used a non-parametric test Coverage Measures characterize a hashtag stream via since the Shapiro-Wilk-Test revealed that not all features the nature of its messages. We introduce four coverage are normally distributed, even after applying arcsine trans- measures. The informational coverage measure (informa- formation to ratio measures. Holm-Bonferroni method was tional) indicates how many messages of a stream have an used for adjusting the p-values and counteract the problem informational purpose - i.e., contain a link. The conversa- of multiple comparisons. For this experiment, we used the tional coverage (conversational) measures the mean number timeframes t0→1 and t1→2 . of messages of a stream that have a conversational purpose - i.e., those messages that are directed to one or several t1→0 t2→1 specific users (e.g., through @replies). The retweet cover- age (retweet) measures the percentage of messages which are retweets. The hashtag coverage (hashtag) measures the mean number of hashtags per message in a stream. t0 t1 t2 3.2.2 Dynamic Pragmatic Measures: To explore how the social structure of a hashtag stream changes over time, we measure the distance between the tweet-frequency distributions of authors at different time t0→1 t1→2 points, and the author-frequency distributions of followers, followees or friends at different time points. The intuition behind these features is that certain semantic categories of Figure 2: Illustration of our time frames hashtags may have a fast changing social structure since new people start and stop using those types of hashtags frequently, while other semantic categories may have a more 3.4 Hashtag Classification stable community around them which changes less over time. Our second aim is to investigate to what extent pragmatic We use a symmetric variation of the Kullback-Leibler di- and lexical properties of hashtag streams contribute to clas- vergence (DKL ) which represents a natural distance mea- sify them into their semantic categories. That means we sure between two probability distributions (A and B) and aim to classify temporal snapshots of hashtag streams into 2 1 is defined as follows: 1 DKL (A||B) + 2 DKL (B||A). The KL their correct semantic categories (to which they were as- divergence is also known as relative entropy or information signed in [12]) just by analyzing how they are used over divergence. The KL divergence is zero if the two distri- time. We then compare the performance of the pragmati- butions are identical and approaches infinity as they differ cally informed classifier with the performance of a classifier more and more. We measure the KL divergence for the dis- informed by lexical features within a semantic multiclass tributions of authors (kl authors), followers (kl followers), classification task. We used the timeframes t1→0 and t2→1 followees (kl followees) and friends (kl friends). for this experiment in order to avoid including information Figure 2 visualizes the different time frames and their no- from the ‘future’ in our classification. tation. t0 only contains the static features computed from We performed grid search with varying hyperparameters data collected at t0 . Consequently, t1 and t2 only contain using Support Vector Machine (linear and RBF kernels) and the static features computed from data collected at t1 or an ensemble method with extremely randomized trees. Since t2 , respectively. t0→1 includes static features computed on extremely randomized trees are a probabilistic method and data collected at t0 and the dynamic measures computed perform slightly different in each run, we run them ten times on data collected at t0 and t1 . t1→0 includes static features and report the average scores. The features were standard- computed on data collected at t1 and the dynamic measures ized by subtracting the mean and scaling to unit variance. computed on data collected at t0 and t1 . t1→2 and t2→1 are We used stratified 6-fold cross-validation (CV) to train and defined in the same way. test each classification model.
  • 5. Since we have two different types of pragmatic features, the 64 hashtag streams. That means we destroyed the orig- static and dynamic ones, we trained and tested three sepa- inal relationship between the pragmatic properties and the rate classification models which were only informed by prag- semantic categories of hashtags and evaluated the perfor- matic information: mance of a classifier which tries to use pragmatic properties Static Pragmatic Model: We trained and tested this to classify hashtags into their shuffled categories within a 6- classification model with static pragmatic features on data fold cross-validation. We repeated the random shuffling 100 collected at t1 using stratified 6-fold CV. The experiment times and used the resulting average F1-score as our base- was repeated on the data collected at t2 . line performance. For the baseline classifier we also used grid Dynamic Pragmatic Model: We trained and tested search to determine the optimal parameters prior to train- the classification model with dynamic pragmatic features on ing. Our baseline classifier tests how well randomly assigned data collected at t0 and t1 using stratified 6-fold CV. The categories can be identified compared to our real semantic computation of our dynamic features requires at least two categories. One needs to note that a simple random guesser time points. We repeated this experiment on data collected baseline would be a weaker baseline than the one described at t1 and t2 . above and would lead to a performance of 1/7. Combined Pragmatic Model: We combined the static To gain further insights into the impact of individual prop- and dynamic pragmatic features, and trained and tested the erties, we analyzed their information gain (IG) with respect classification model on the data of t1→0 using stratified 6- to the categories. The information gain measures how accu- fold CV. Again, we repeated the experiment on the data of rately a specific stream property P is able to predict stream’s t2→1 . category C and is defined as follows: IG(C, P ) =H(C) − We also performed our classification with a model using H(C | P ) where H denotes the entropy. our lexical features (i.e., TF weighted words). Finally, we trained and tested a combined classification model using pragmatic and lexical features, which leads to the follow- 4. RESULTS ing classification models: In the following section, we present the results from our Lexical Model: We trained and tested the model on empirical study on usage patterns of different semantic cat- data from t1 using stratified 6-fold CV and repeated the egories of hashtags. experiment on data collected at t2 . Combined Pragmatic and Lexical Model: We trained 4.1 Usage Patterns of Hashtag Categories and tested the mixed classifier on the data of t1→0 us- To answer our first research question, we explored to what ing stratified 6-fold CV, then repeated this experiment for extent usage patterns of hashtag streams in different seman- t2→1 . A simple concatenation of pragmatic and lexical fea- tic categories are indeed significantly different. tures is not useful, since the vast amount of lexical features Our results indicate that some pragmatic measures are would overrule the pragmatic features. Therefore, we used indeed significantly different for distinct semantic categories. a stacking method (see [2]) and performed firstly a clas- This indicates that hashtags of certain categories are used sification using lexical features alone and Leave-One-Out in a very specific way which may allow us to relate these cross-validation. We used a SVM with linear kernel for hashtags with their semantic categories just by observing this classification since it worked best for these features. how users use them. Table 3 depicts the measures that show Secondly, we combined the pragmatic features with the re- statistically significant (p < 0.05) differences in both t0→1 sulting seven probability features which we got from the and t1→2 . In total, 35 pragmatic category differences were previous classification model and which describe how likely found to be statistically significant (with p < 0.05) for t0→1 each semantic class is for a certain stream given its words. and 33 for t1→2 . 26 pragmatic category differences were To get a fair baseline for our experiment, we constructed a found to be significant for both t0→1 and t1→2 which suggests control dataset by randomly shuffling the category labels of that results are independent of our choice of time frame. Table 3: Features which showed a statistically significant difference (with p < 0.05) for each pair of categories in both t0→1 and t1→2 games idioms movies music political sports idioms informational retweet movies informational music informational political kl followers kl authors kl followers kl followees informational hashtag sports kl followers kl authors kl followers informational technology kl followers kl authors kl friends kl friends overlap authorfollower kl followers overlap authorfriend kl followees kl friends informational retweet hashtag
  • 6. technology technology sports sports political political music music movies movies idioms idioms games games 0.0 0.2 0.4 0.6 0.8 1.0 6 8 10 12 14 Informational Coverage KL Divergence Authors (a) This figure shows the percentage of mes- (b) This figure shows how much the au- sages of hashtag streams belonging to differ- thors’ tweet-frequency distributions of hash- ent categories that contain at least one link. tag streams of different categories change on average. technology technology sports sports political political music music movies movies idioms idioms games games 1 2 3 4 5 1 2 3 4 5 6 KL Divergence Followers KL Divergence Friends (c) This figure shows how much the follow- (d) This figure shows how much the friends’ ers’ author-frequency distributions of hash- author-frequency distributions of hashtag tag streams of different categories change on streams of different categories change on av- average. erage. Figure 3: Each plot shows the feature distribution of different categories of one of the 4 best pragmatic features for t0→1 . We obtained similar results for t1→2 . Not surprisingly, the category which shows the most spe- The category technology can be distinguished from all cific usage patterns is idioms and therefore the hashtags of other categories except sports, particularly because its fol- this category can be distinguished from all hashtags just by lowers’ and friends’ author-frequency distributions have analyzing their pragmatic properties. Hashtag streams of significantly lower symmetric KL divergences than hash- the category idioms exhibit a significantly lower informa- tags in the categories games, idioms, movies and music tional coverage than hashtag streams of all other categories (see Figures 3(c) and 3(d)). This indicates that hashtag (see Figure 3(a)) and a significantly higher symmetric KL di- streams in the category technology have a stable social vergence for author’s tweet-frequency distributions (see Fig- structure which changes less over time. This is not sur- ure 3(b)). Also the followers’ and friends’ author-frequency prising since this semantic category denotes a topical area distributions tend to have a higher symmetric KL divergence and users who are interested in such areas may consume for idioms hashtags than for other hashtags (see Figures 3(c) and provide information on a regular base. It is especially and 3(d)). This indicates that the social structure of hashtag interesting to note that the only pragmatic measures which streams in the category idioms changes faster than hashtags allows distinguishing political and technological hashtag of other categories. Furthermore, hashtag streams of this streams are the author-follower and author-friend over- category are less informative - i.e., contain significantly less laps since these overlaps are significantly lower for the links per message on average. category technology compared to the category political.
  • 7. 1.0 1.0 Baseline performance Model performance Baseline performance Model performance 0.8 0.8 F1 measure F1 measure 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 Comb. Pragmatic and Lexical Combined Pragmatic Dynamic Pragmatic Lexical Static Pragmatic Comb. Pragmatic and Lexical Combined Pragmatic Dynamic Pragmatic Lexical Static Pragmatic (a) t1→0 (b) t2→1 Figure 4: Weighted averaged F1-scores of different classification models trained and tested on t1→0 4(a) and t2→1 4(b) using 6-fold cross-validation This indicates that the content of hashtag streams of the category political is far more likely to be produced and con- Table 4: Top features for two different datasets sumed by the same people than content of technological ranked via Information Gain Rank t1→0 t2→1 hashtag streams. Comparing the individual measures reveals that the infor- 1 informational kl followers 2 kl followers informational mational coverage (six category pairs) and the symmetric 3 kl friends hashtag KL divergences of followers’ author-frequency distributions 4 hashtag kl followees (six category pairs), authors’ tweet-frequency distributions 5 norm entropy friend kl friends (three pairs) and friends’ follower-frequency distributions (three pairs) are the most discriminative measures. Figure 3 depicts the distributions of these four measures per category. 4.2.1 Feature Ranking: Other measures that show significant differences in medians In addition to the overall classification performance which for both t0→1 and t1→2 are the symmetric KL divergence can be achieved solely based on analyzing the pragmatics of of followees’ author-frequency distributions (two pairs), the hashtags, we were also interested in gaining insights into the author-follower and the author-friend overlap (one pair) as impact of individual pragmatic features. To evaluate the in- well as the retweet and hashtag coverage (two pairs). dividual performance of the features we used information Some measures like the conversational coverage measure gain (with respect to the categories) as a ranking criterion. did not show any significant differences for any of the cat- The ranking was performed on t1→0 and t2→1 with stratified egory pairs, for any time frame. This indicates that in all 6-fold cross-validation. Table 4 shows that the top five fea- hashtag streams an equal amount of conversational activities tures (i.e., the pragmatic features which reveal most about take place. the semantic of hashtags) are features which capture the temporal dynamics of the social context of a hashtag (i.e., 4.2 Hashtag Classification the temporal follower, followees and friends dynamics) as In order to quantify the value of different pragmatic and well as the informational and hashtag coverage. This indi- lexical properties of hashtag streams for predicting their se- cates that the collective purpose for which a hashtag is used mantic category, we conducted a hashtag stream classifi- (i.e., if it used to share information rather than for other cation experiment and systematically compared the perfor- purposes) and the social dynamics around a hashtags – i.e., mance of various classification models trained with different who uses a hashtag for whom – play a key role in under- sets of features. standing its semantics. Figure 4 shows the performance of the best classifier (ex- tremely randomized trees) trained with different sets of fea- tures. One can see from this figure that in general lexical 5. DISCUSSION OF RESULTS features perform better than pragmatic features, but also Although our results show that lexical features work best that pragmatic features (both static and dynamic) signifi- within the semantic classification task, those features are cantly outperform a random baseline. This indicates that text and language dependent. Therefore, their applicability pragmatic features indeed reveal information about a hash- is limited to settings where text is available. Pragmatic fea- tag’s meaning, even though they do not match the perfor- tures on the other hand rely on usage information which is mance of lexical features in this case. In 4(a) we can see that independent of the type of content which is shared in social for t1→0 the combination of lexical and pragmatic features streams and can therefore also be computed for social video performs slightly better than using lexical features alone. or image streams.
  • 8. We believe that pragmatic features can supplement lex- Proceedings of the 19th International World Wide ical features if lexical features alone are not sufficient. In Web Conference (WWW 2010), Raleigh, NC, USA, our experiments, we could see that the performance may Apr. 2010. ACM. slightly increase when combining pragmatic and lexical fea- [6] D. Laniado and P. Mika. Making sense of twitter. In tures. However, the effect was not significant. We think P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, the reason for this is that in our setup lexical features alone L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm, already achieved good performance. editors, International Semantic Web Conference (1), The classification results coincide with the results of the volume 6496 of Lecture Notes in Computer Science, statistical significance tests. Ranking the properties by in- pages 470–485. Springer, 2010. formation gain showed that the most discriminative proper- [7] C. Messina. Groups for twitter; or a proposal for ties (the ones that showed a statistical significance in both twitter tag channels. http://factoryjoe.com/blog/ t0→1 and t1→2 for the highest amount of category pairs) 2007/08/25/groups-for-twitter-or-a-proposal- found in 4.1 were also the top ranked features (informational for-twitter-tag-channels/, 2007. coverage and the KL divergences). [8] M. G. Noll and C. Meinel. Exploring social annotations for web document classification. In 6. CONCLUSIONS AND IMPLICATIONS Proceedings of the 2008 ACM symposium on Applied Our work suggests that the collective usage of hashtags computing, SAC ’08, pages 2315–2320, New York, NY, indeed reveals information about their semantics. How- USA, 2008. ACM. ever, further research is required to explore the relations [9] M. G. Noll and C. Meinel. The metadata triumvirate: between usage information and semantics, especially in do- Social annotations, anchor texts and search queries. In mains where limited text is available. We hope that our Web Intelligence, pages 640–647. IEEE, 2008. research is a first step into this direction since it shows that [10] S. Overell, B. Sigurbj¨rnsson, and R. van Zwol. o hashtags of different semantic categories are indeed used in Classifying tags using open content resources. In different ways. Proceedings of the Second ACM International Our work has implications for researchers and practition- Conference on Web Search and Data Mining, WSDM ers interested in investigating the semantics of social media ’09, pages 64–73, New York, NY, USA, 2009. ACM. content. Social media applications such as Twitter provide [11] S. Panzeri and A. Treves. Analytical estimates of a huge amount of textual information. Beside the textual limited sampling biases in different information information, also usage information can be obtained from measures. Network: Computation in Neural Systems, these platforms and our work shows how this information 7(1):87–107, 1996. can be exploited for assigning semantic annotations to tex- [12] D. M. Romero, B. Meeder, and J. Kleinberg. tual data streams. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex 7. ACKNOWLEDGEMENTS contagion on twitter. In Proceedings of the 20th international conference on World wide web, WWW This work was supported in part by a DOC-fForte fellow- ’11, pages 695–704, New York, NY, USA, 2011. ACM. ship of the Austrian Academy of Science to Claudia Wagner. [13] P. Schmitz. Inducing ontology from flickr tags. In Furthermore, this work is in part funded by the FWF Aus- Proceedings of the Workshop on Collaborative Tagging trian Science Fund Grant I677 and the Know-Center Graz. at WWW2006, Edinburgh, Scotland, May 2006. [14] M. Strohmaier, D. Helic, D. Benz, C. K¨rner, and o 8. REFERENCES R. Kern. Evaluation of folksonomy induction [1] D. Benz, C. K¨rner, A. Hotho, G. Stumme, and o algorithms. Transactions on Intelligent Systems and M. Strohmaier. One tag to bind them all: Measuring Technology, 2012. term abstractness in social metadata. In Proc. of 8th [15] O. Tsur and A. Rappoport. What’s in a hashtag?: Extended Semantic Web Conference ESWC 2011, content based prediction of the spread of ideas in Heraklion, Crete, (May 2011), 2011. microblogging communities. In Proceedings of the fifth [2] T. Hastie, R. Tibshirani, and J. Friedman. The ACM international conference on Web search and Elements of Statistical Learning. Springer Series in data mining, WSDM ’12, pages 643–652, New York, Statistics. Springer New York Inc., New York, NY, NY, USA, 2012. ACM. USA, 2001. [16] C. Wagner and M. Strohmaier. The wisdom in [3] P. Heymann and H. Garcia-Molina. Collaborative tweetonomies: Acquiring latent conceptual structures creation of communal hierarchical taxonomies in social from social awareness streams. In Semantic Search tagging systems. Technical Report 2006-10, Computer Workshop at WWW2010, 2010. Science Department, April 2006. [17] L. Wittgenstein. Philosophical Investigations. [4] J. Huang, K. M. Thornton, and E. N. Efthimiadis. Blackwell Publishers, London, United Kingdom, 1953. Conversational tagging in twitter. In Proceedings of Republished 2001. the 21st ACM conference on Hypertext and [18] L. Yang, T. Sun, M. Zhang, and Q. Mei. We know hypermedia, HT ’10, pages 173–178, New York, NY, what @you #tag: does the dual role affect hashtag USA, 2010. ACM. adoption? In Proceedings of the 21st international [5] C. K¨rner, D. Benz, M. Strohmaier, A. Hotho, and o conference on World Wide Web, WWW ’12, pages G. Stumme. Stop thinking, start tagging - tag 261–270, New York, NY, USA, 2012. ACM. semantics emerge from collaborative verbosity. In