Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
1. Tweets are Not Created Equal
Intersecting Devices in the 1% Sample
Carolin Gerlitz & Bernhard Rieder
IR15 - Boundaries & Intersections
October 23, 2014
2. Digging deeper into
Twitter devices
The Twitter API's 1% random sample can
be used to explore, baseline, contextualize,
verify, etc. (Gerlitz & Rieder 2013,
Morstatter et al. 2014).
How can we qualify individual elements in
relation to a larger platform ecology?
The presentation inquires more deeply into
the role devices play on Twitter.
We used a week-long random sample of
tweets to further explore this aspect.
(14.6.2014 - 20.6.2014, n = 31.707.162)
3. Devices intersect use
practices
There has been a proliferation of very
different devices (mobile, desktop, web,
buttons, bots, etc.) from which people send
their tweets. It's full of devices!
Thinking Twitter as ecology of connected
devices, we ask (1) how we can qualify
devices and (2) how devices can enable us
to unpack metrics for studying use cultures.
Frequency based metrics suggest that the
units they count are equivalent (e.g. tweets
per time for a certain hashtag).
Do we need to conceptualize devices as
intervening variables?
10. Devices & use practices
Desktop clients (Web, Tweetdeck, etc.) are
overrepresented in news conversations;
Tweetdeck also points towards
professional social media practices.
The iPhone is the preferred microphone of
the American teenager.
Custom autopost clients (platforms, games,
etc.) are engaged in activity loops.
Automation clients (dlvr.it, IFTT, or
Tweetadder) empower promotion, spam,
hijacking, and syndication practices.
Different devices have different
capacities and enable different ways of
engaging with the Twitter platform
(posting, observing, responding, etc.).
14. Devices intersect
practices
Tweets are not created equal. Devices imply different regimes of "being on Twitter"
that are caught up in different perspectives, purposes, and politics.
Twitter takes part in complex platform ecologies that mediate tweeting in different
ways and are thus co-constitutive of practices. Devices intersect practices.
For Internet researchers, this creates problems and opportunities. Devices as
intervening variables can both skew and explain.
Frequency counts that do not take into account devices are problematic: do 100K
tweets from Tweetadder "mean" or "indicate" the same thing as 100K sent from the
iPhone? They refer to different populations, practices, purposes, and politics.
15. Conclusion
Frequency counts are not comparable from the outset, but need to be made
comparable by including devices in the interpretation.
Devices need to be taken into account when sampling, cleaning, analyzing, and
interpreting Twitter data.
This kind of unpacking and repacking of components in the platform ecologies can
be performed for various other elements. (cf. Bruns & Stieglitz 2013)
16. Thank you.
Carolin Gerlitz, c.gerlitz@uva.nl, @cgrltz
Bernhard Rieder, rieder@uva.nl, @riederb
DMI-TCAT (Borra & Rieder 2014), open source, available at:
https://github.com/digitalmethodsinitiative/dmi-tcat
Editor's Notes
This is work in progress
We changed out title
One aspect of a larger project on thinking about metrics in Twitter data analysis
68.747 different devices, specified by a field that API programmers have to fill out. Android + iPhone are a little more than 50% of all tweets.
The 1% sample is 1 out of 100, thus what we look at are high volume spaces, no fringe practices here.
For representativeness see: Morstatter, Fred, Jürgen Pfeffer, and Huan Liu. "When is it biased?: assessing the representativeness of twitter's streaming API." Proceedings of the companion publication of the 23rd international conference on World wide web companion. International World Wide Web Conferences Steering Committee, 2014.
It’s full of devices!
Previous approaches towards hashtag qualifications, focused on user composition Bruns, A. & Stieglitz, S., 2013. Towards more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology, 16(2), pp.91–108.
RT japanese stuff: I follow evryone
Kudunews: celebrity
Explain sanket as element of TCAT
issue based hashtag based on english language and arabic tweets.
points to youtube and news sources
diversity of non-automated devices with a predominance of web clients and tweetdeck: points towards professional and news based practices
#callmecam is an eventive mass activation hashtag which is the most frequently used hashtag on iPhone in the sampled week. It has been designed by the youtube celebrity Cameron Dallas to drive his fan’s engagement. Whoever tweets the hashtag gets the chance to win a call from Dallas.
The hashtags it is connected are more expressive variations of #callmecam.
The high volume of hashtags is due to the fact that many tweets use the hashtag several times.
The hashtag is mainly driven by users tweeting from iPhone, which can be considered as stand in for teen practices.
Shoutout hashtags drive up the frequency of hashtags, seeing them from a device perspective allows to approach what kind of users might be driving this frequency.
Teenage girls rule the trending topics.
#gameinsight has been a high-frequency hashtag since we started studying the one percent sample in January 2013.
Engaging with it from a device perspective allows to understand the specific automated practices behind them. Tweets with #gameinsight are produced through autopost from games which are issued by default when users connect the game to their Twitter account. The tweets feature in-game achievements and contain links back to the Facebook app.
Due to its volume, the hashtag is constantly hijacked by spammers who send users to different websites they seek to promote.
The practice of hashtag hijacking becomes more apparent in the example of the hashtag love. It is being catered by a multiplicity of devices: most notably Instagram, the happy space and dlvr.it, which uses the hashtag #love to participate in its audience to promote two websites at high volume.
Points to systematic hashtag hijacking for promoting retail websites.
Looking at devices offers a insights on how web content is being shared and contextualised.
Diverse set of devices link to NYT
Among them: NYT websites appear in a spammy environments
Youtube has been one of the most shared domain on Twitter, which is slowly taken over by vine.
Diversity of devices, high degree of hashtags, even from mainstream clients.
iPhone used to promote the work of youtube celebrity Nash Grier, friends with Cameron Dallas through the hashtag #nashnewvideo. The twees make up ¼ of all tweets send from iPhone mentioning Youtube, pointing again to organic use practices driven by teenage fans.
YouTube autoposter appears as "Google"
Arabic tweets indicate alternative news syndicates
Diversity of devices, many automators
Hootsuite: autopost listings, etc.
hashtag intense environments
Extensive support infrastructure in etsy communities on how to use Twitter for etsy promotion
trailblazing: community specific promotional devices, fostered by support infrastructures and specific symposis
Language specific use of devices