The document provides an overview of Stephen Dann's methodology for analyzing Twitter data and conversations. It discusses acquiring Twitter data through personal timelines, hashtag captures, or timeline captures. The data is then processed by extracting tweets into Excel for manual coding into categories. Additional analysis includes LIWC for word counts and Leximancer for concept mapping. Metrics like tweet count, character density, network density, and average/unique word counts are calculated and normalized for comparison across categories. The analysis aims to provide insights into research questions about changes in tweeting patterns, hashtag conversations over time, or account engagement.
3. A little context
The Past
• Dann (2010)
– Six top level twitter
categories
– 23 sub domains
• Dann (2011)
– Six top level
– 28 sub domain
The Present
• Dann (Today)
– Six Top Level Categories
• No sub domain analysis
– Secondary Processing
• Leximancer
• Linguistic Inquiry Word
Count
5. Acquire Research Question
• Does Event X change the tweeting patterns of Account @Y?
• Do responses to the #hashtag event change over time?
– #EventTags in Time Period A will have more Status than in Time Period D
– Time Period D will have more Pass Along than Status
• What were they thinking?
– Dominant Categories of tweets over time within a selected account
• Do comments change by platform for account @X?
– mobile versus web versus desktop
• Does @BrandX engage with the community?
– Conversational over all other types over capture time period
6. Acquire your data
• Personal timelines
– Download from Twitter
• #Hashtag captures
– Hootsuite
• Time line captures
– Choose your own adventure
– Getting worse, harder and
Twitter’s API is less available.
• Try to avoid big data
7. Big Data
• If you are Axel Bruns, fine, continue
– http://mappingonlinepublics.net/
• For everyone else, what are you looking for?
– What sample suits your research question?
8. Process your data
• Stand by for ugliness and manual coding*
– Extract data into Excel
• Excel allows for additional data inputs as you progress the
analysis
– Keep tweet visible
• Only keep a column visible if it fits your research question
– Eg date, time, @user, platform
– Add column for Tweet ID, category, cat_n
• Sub category, sub_cat_n for the detailed version
*Automated coding? People are working on it. It’s a terrible idea that’ll happen anyway
9. Manual Coding
• Use the Dann (2010) or Dann (2011) top level
domains
– Dann (201X) is under development
• I broke something important earlier this year
• Manual coding is superior
– Nuance and interpretation counts.
10. Pick a box
1 Conversational Uses an @statement to address another user
2 News Events Identifiable news content
3 Pass along Tweets of endorsement of content
4 Phatic Content independent connected presence
5 Status
Tweets which address the statement "What are you doing?"
and "What's happening?" in terms of an account holder's
experiences
6 Spam Unsolicited content
11. Keep it on manual
Conversational Uses an @statement to address another user
1.1 Action
Activities involving other Twitter users, or tweets which
describe the presence of other Twitter users.
1.2 Query
Any statement style tweet that ends with a question mark, as it
represents an active attempt to engage responses from the
community
1.3 Referral
An @response which contains URLs or recommendation of
other Twitter users. (Excludes RT @user)
1.4 Response
Classification for tweets which commence with another user’s
name and which do not meet the requirements of the referral
category
1.5 Rhetoric Question
Asked and answered within the same tweet (distinct from
Conversational - Query) which may not require (but may elicit)
audience response
12. Upgrades
Pass along Tweets of endorsement of content
3.1 Automated
Endorsement Status announcements triggered by third party applications which publish URLs
3.2
Endorsement Links to web content not created by the sender
3.3 Retweet Any statement reproducing another Twitter status using the via @ or RT protocol
3.4 Secondary
Social Media Links to Facebook (fb.me) or similar social media platform
3.5 User
generated
content Links to own content created by the user
3.6 Quote
Comment marked with “ “ to represent a direct quote, paraphrase of a statement
without a source URL, including reference to offline speaker or overheard (OH)
3.7 Cite
Any tweet which contains a reference in a recognised Harvard, Oxford or similar
format
3.8 Modified
ReTweet Acknowledgement of the use of MT protocol to allow for an edited RT.
18. Tweet Math Dude
• Tweet Count
– N per category
• Calculate the Tweet Ratio
– Tweet ratio is a normalized rank order of the highest
volume of tweets, where the most common category is
scored as 1
• Calculating the Tweet Ratio
– Highest number of tweets in a single category = TTMax
– Tweets per category = TCat
– Ratio is Tcat / TTMax
I’m only mildly mocking statistical analysis here
19. Maximum Character Density
• Max Density = 140 x TCat [number of tweets in
each category]
• Theoretical range for a tweet is between 1 and 140
characters
• Maximum tweet is 140 characters
• More characters used, more information density
• Calculate Character Density
– (Actual Character / Max Density)
• Divide each CharDensity score by the highest Char density
• Normalise CharDensity score to rank order
23. LIWC
• http://www.liwc.net/
– text analysis software
– calculates the degree to which people use
different categories of words in texts
• 70 other language dimensions.
– positive or negative emotions,
– self-references,
– causal words,
24. A giant bucket of data
• 70 variables
– So have a hypothesis and a purpose for the
analysis
• Differences in tweet construction
– Word Counts
– Unique Words
25. Results
Average Word Count (AWC) Unique Word Count (UWC)
Category AWC AWC_Ratio
Conversational
12.82 0.78
News 13.56 0.82
Pass Along
16.35 1
Phatic 15.42 0.94
Status 12.94 0.79
Category UWC UWC_Ratio
Conversational
93 0.97
News 93 0.97
Pass Along
92 0.96
Phatic 93 0.97
Status 96 1
26. Results
Word Count Unique Word
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Conversational
News
Pass AlongPhatic
Status
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Conversational
News
Pass AlongPhatic
Status
Chart Title
28. Leximancer
• Import into Leximancer as an individual
analysis (individual project)
– Edit Pre processing options: Sentence per block 1
– Run to Generate Outputs
– Generate Concept Map
30. Four sample maps
Entirely because quadrants fit on screens better than hexes. No other reason
conversational
news
pass along
phatic
31. Tweet Network Density
• Calculate Network Density
– Count Nodes (n)
– Count Actual Connections (e) Edges (paths
between nodes)
– Calculate Network density based on 2e / n(n-1)
• Network Density Notes
– Calculate potential connections
33. Network Density Results
Category Nodes Edges
Network
Density
Conversational 13 12 0.15
News 18 17 0.11
Pass Along 15 15 0.14
Phatic 3 2 0.67
Status 4 3 0.50
n 19 17 0.10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Conversational
News
Pass AlongPhatic
Status
34. One Bucket of Data
• This is why a research question is important
– You can map a range of information
– None of it is useful without the RQ / hypothesis
– It’s pretty, but not valuable
Category Tweet Density Network Ave.WC
Unique
Words
Conversatio
nal 0.081081 0.819075 0.814598 0.830959 0.96875
News 0.085239 0.83315 0.828595 0.878952 0.96875
Pass Along 1 1.005496 1 1.059722 0.958333
Phatic 0.043659 0.938173 0.933044 1 0.96875
Status 0.037422 0.775065 0.770829 0.838992 1
0
0.2
0.4
0.6
0.8
1
1.2
Tweet Density Network Ave.WC Unique Words
Chart Title
Conversational News Pass Along Phatic Status