This document summarizes an analysis of Twitter data related to consumer electronic brands collected between May 13-25, 2014. It was found that Apple and iPhone accounted for 87% of tweet volume. The data was optimized by converting 15.3GB of JSON data to 757MB of CSV format. Tweets mentioning seven brands were analyzed for sentiment and found to have roughly equal ratios of positive to negative tweets, except Samsung which had a ratio of roughly 3:2. Spikes in tweet volume coincided with product launches. The analysis revealed differences in users tweeting about different brands as well as opportunities for future analysis.
2. Which brands get tweeted about most? Is it mainly positive or negative?
3. 15.3 gbof JSON data downloaded from Twitter’s Streaming API
between 13 –25 May using Python
4. Before processing, tweets were in raw JSON format
Time Created
Tweet text/status
Username
Tweet location (if available)
No. of followers
No. of people followed
No. of statuses
Language
Data should be optimized as only a fraction of the data used for analysis— optimization improves performance in models and saves cost and time
5. The same tweet we saw previously
By optimizing the data,
15.3 gbof jsonwas converted to 757 mbof csv (5% of original size)
After processing, only some fields retained and converted to CSV format
6. Brand
Positive Sentiment
Brand
Negative Sentiment
Brand
Mixed Sentiment
The list of words for sentiment analysis is adapted from
the Harvard General Inquirer dictionaries
Source: http://www.wjh.harvard.edu/~inquirer/homecat.htm, downloaded on 28 May 2014
Tweets are then tagged for brand and sentiment in R
7. Initially, collected tweets based on 17 keywords
Samsung
S4
Xperia
HTC
Huawei
BlackBerry
Apple
S5
Sony
Nokia
Note 3
Lumia
q5
iPhone
q10
z10
Motorala
8. “Apple” and “iPhone” accounted for 87% of tweet volume
Removed from keywords during actual data collection to focus on other brands (, save space, and reduce bandwidth usage)
A trial was conducted with 16 keywords on 11 May, 8 –9am
1 gbof JSON data was collected in a hour
During a one hour trial, “Apple” and “iPhone” had 87% share of tweets
9. Samsung
Sony
Nokia
HTC
Huawei
BlackBerry
Motorola
Tweets containing seven keywords were collected from 13 –25 May
10. 4% of tweets mentioned > 2 brands; they were excluded from analysis
8% of tweets had mixed sentiment (i.e., positive and negative sentiment); they were excluded from analysis
92% of tweets remained, each only mentioning 1 brand with either “positive”, “negative”, or “neutral” sentiment
3,681,942 tweets were collected
After processing, 3,234,678 tweets remained for analysis
11. Samsung leads in twitter buzz, followed by Sony and Nokia
Together, they make up 75% of twitter buzz
Samsung is the clear leader in twitter buzz, followed by Sony and Nokia
However, Samsung and Sony have wider product offerings relative to the rest that mainly focus on phones
Also, Huawei’s users may mainly be on Weibo, Renren, etc
12. Most brands have roughly 1:1 ratio of positive to negative tweets
Samsung is the exception with ratio of roughly 3:2
Brands have equal ratio of positive to negative tweets
13. Dip due to connectivity issues
Brands’ share of tweets is roughly consistent over time
16. Users who tweet about BlackBerry tend to be better connected (i.e., higher median of followers and people followed)*
* Excluding outliers
Across brands, there is not much difference in user connectedness
The median user has around 250 followers and also follows 250 people
17. 50th–75thpercentile of users who tweet about Sony, HTC, and Motorola have very high numbers of all time tweets (spam bots perhaps?)*
While Nokia is 3rdin twitter buzz share (14%), users who tweet about Nokia have least numbers of all time tweets
Suggests that tweets likely to come from real users and not bots (or maybe less active bots)
* Excluding outliers
However, there is a large difference between users’ all time tweets
19. 1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
20. 1753696 tweets
1730006
tweets
A bot that retweets on farts has the highest all time tweets
21. Initially, BlackBerry tweets showed 100% negative sentiment
Culprit was the word “lack”—it was removed
However, removing it reduced negative sentiment for other brands by 2 –3 %
An interesting error led to BlackBerry having 100% negative sentiment
22. Track brands’ managed twitter accounts and conversations to measure engagement
Which brands have better engagement with users and why?
Track general message of tweets
Are tweets of a brand mainly about sales, reviews, complaints, or news?
Network analysis to identify users with high centrality and influence
Which users have high influence and what are they tweeting about my brand?
Geospatial analysis of tweets
Are there differences in brand buzz, sentiment, and engagement across regions?
Where do we go from here?
23. Code available on GitHub: https://github.com/eugeneyan/Twitter-SMA
Python script to download tweets in JSON format
Python scripts to convert tweets from JSON to CSV (with & without regular expressions filtering)
R script and sentiment analysis list of words
R script and sentiment analysis list of words to reproduce BlackBerry error