6. How do we begin to extract what happened and
what people talked about?
7. “Trending Topics” &
memes are
insufficient
Too general. Highlights
basically what you would want
to stoplist.
8. M. Bernstein, B. Suh, L. Hong, J. Chen, S. Kairam, and E. Chi. Eddi: Interactive
Topic-based Browsing of Social Status Streams. ICWSM, 2010.
S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen. Microblogging during two
natural hazards events: what twitter may contribute to situational awareness. In
CHI ’10: Proceedings of the 28th international conference on Human factors in
computing systems, pages 1079–1088, New York, NY, USA, 2010. ACM.
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding
microblogging usage and communities. In WebKDD/SNA-KDD ’07: Proceedings of
the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social
network analysis, pages 56–65, New York, NY, USA, 2007. ACM.
B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about twitter. In WOSP ’08:
Proceedings of the first workshop on Online social networks, pages 19–24, New
York, NY, USA, 2008. ACM.
9. traditional tf•idf won’t work: a tweet is a very
small document...too small. The vector is too flat.
“OMG - Obama just messed up the oath -
AWESOME! he’s human!”
Create pseudo-documents from 5-minute slices
10. No significant
occurrence of a
term.
Term occurs
significantly
Find terms which are only
windows are 5 minute slices
particular to a window
11. Find terms which are only
windows are 5 minute slices
particular to a window
18. 0.35
0.25
Terms are not sailent However they persist.
19. People announce
(12:05) Bastille71: OMG - Obama
just messed up the oath -
AWESOME! he’s human!
(12:07) ryantherobot: LOL Obama
messed up his inaugural oath
twice! regardless, Obama is the
president today! whoooo!
(12:46) mattycus: RT @deelah: it
wasn’t Obama that messed the
oath, it was Chief Justice
Roberts: http://is.gd/gAVo
(12:53) dawngoldberg:
@therichbrooks He flubbed the
oath because Chief Justice
screwed up the order of the
words.
20. People reply
(12:05) Bastille71: OMG - Obama
just messed up the oath -
AWESOME! he’s human!
(12:07) ryantherobot: LOL Obama
messed up his inaugural oath
twice! regardless, Obama is the
president today! whoooo!
(12:46) mattycus: RT @deelah: it
wasn’t Obama that messed the
oath, it was Chief Justice
Roberts: http://is.gd/gAVo
(12:53) dawngoldberg:
@therichbrooks He flubbed the
oath because Chief Justice
screwed up the order of the
words.
And now for the quant method part of the show. Spend more time setting the context for the data. Story and cultural background is important here.\n
Emphasize what this is as a broadcast media event. And its a big deal.\n
\n
\n
\n
\n
There is a need to look beyond memes and trends. These are actually good for pulling out over-arching themes, but poor at describing the event.\n
\n
So for every 5 minutes, we ignore who tweeted what...just pool it all together.\n
\n
\n
\n
\n
Not everything “trends” not everything is a “spike”\n
\n
\n
\n
\n
\n
\n
Explain this too. What is this...\n
\n
\n
\n
Its important over time, even if its not a trend or if it trends momentarily.\n\n\n
We’ve taken this through mechanics and techniques; this augments SNA. We are pushing the bounds of SNA in regards to media and event. The importance of an event needs to be measured over time, not just in the right here right now. We know the revolution will be something to remember. Its what happened in there that we need to pull out.\n
thanks\n\nReviewer 3 raises some concerns about this proposed variation of TFIDF with particular attention to why the TF metric is defined at the document level. We count the number of tweets that a term appears in (rather than raw total frequency of the term) since we are focused on gauging the number of *people* using a term at a given time. A raw count of occurrence can be biased by a single user repeating a term many times in one tweet, which we did observe. R3 also noticed that our TFIDF variant is sensitive to very infrequent terms (which is also the case with traditional TFIDF). We also observed this problem and handled it by treating terms with frequencies below a certain threshold as stop words.\n