Social Networks such as Facebook, Twitter, Google+
and LinkedIn have millions of users. These networks are constantly
evolving and it is a good source of information, both
explicitly and implicitly. The analysis of Social Network mainly
focuses on the aspect of social networking with an emphasis
on mapping relationships, patterns of interaction between user
and content information. One of the common research topics
focuses on the centrality measures where useful information of
the connected people in the social network is represented in
a graph. In this paper, we employed two link-based ranking
algorithms to analyze the ranking of the users: HITS (Hyperlink-
Induced Topic Search) and PageRank. We constructed Twitter
user retweet-relationship graph using 21 days worth of data.
Lastly, we compared the ranking sequence of the users in addition
to their followers count against the average and also whether
they are verified Twitter accounts. From the results obtained,
both HITS and PageRank showed a similar trend, and more
importantly highlighted the importance of the direction of the
edges in this work.
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis
1. AComparativeStudy of
HITS vs PageRank
Algorithms forTwitter
UsersAnalysis
Ong Kok Chien , Poo Kuan Hoong and Chiung Ching Ho
Faculty of Computing Informatics, Multimedia University Cyberjaya.1
3. Introduction
Graph analysis algorithms on Social Network.
Identify worth noticingTwitter Users for a
specific bag of topics.
3
4. Twitter
Maximum 140 characters microblogging site.
“ATweet is an expression of a moment or idea. It can
contain text, photos, and videos. Millions ofTweets are
shared in real time, every day.”
Reply
Retweet
Favorite
Hashtags
https://about.twitter.com/what-is-twitter/story-of-a-tweet
.com
4
5. Problem
Statement
Why is it important to rank users?
By ranking users, we aims to differentiate
relevant important information sources from
those provided by spam accounts.
5
6. Objectives
To rankTwitter users using HITS and Page Rank.
To identify the direction of edges for the graph.
6
7. Methods
Link-based ranking algorithms (HITS &
PageRank)
Twitter Users as Nodes.
Retweet relationships as Edges.
Direction of graph.
7
8. Example
PageRank (PR)
E.g.: BackLinks inWebsites - Referring back to OriginalContent.
- Sergey Brin & Larry Page (1998). The anatomy of a large-scale hypertextual Web search engine.
Image extracted from Wikipedia
8
9. Example
Hyperlink-InducedTopic Search (HITS)
Hubs : Catalog for relevant contents.
Authorities : Great contents itself.
- Jon Kleinberg (1999). Authoritative sources in a hyperlinked environment.
Image extracted from cornell.edu
9
10. Example
Minister ofYouth & Sports
Khairykj
shatyrah2 AyenSanji
10 https://twitter.com/Khairykj/status/410964119521460224
13. Basic
Statistics
Raw Dataset
TotalTweets : 230,166
Total Unique Users : 121,461 ( screen_name )
TotalVerified Users : 113 ( verified )
Average Followers Count : 983 ( followers_count )
Experiment Dataset ( Retweets )
No. ofTweets: 56,727
No. of Unique users: 50,636
9th Dec 2013 -> 29th Dec 2013
13
14. Results
PageRank Ranking HITS
TeamMsia 1 TeamMsia
ManOlimpik 2 Khairykj
Khairykj 3 ManOlimpik
OKS_HARIMAUMUDA 4 OKS_HARIMAUMUDA
BrooksBeau 5 FIH_Hockey
TMCorp 6 BB_Johor
LawakLegend 7 AtletMalaysia
WTFSG 8 TMCorp
JanganPanas 9 Faif_D
FIH_Hockey 10 BBST15
60%Top10 were the same bag of users. (70% forTOP20)
14
15. Results
User Screen Name Verified Follower
Counts
TeamMsia False 98,469
ManOlimpik False 1,661
Khairykj True 432,259
OKS_HARIMAUMUDA False 35,058
BrooksBeau False 1,226,629
TMCorp False 13,767
LawakLegend False 49,593
WTFSG False 469,909
JanganPanas False 14,642
FIH_Hockey False 24,628
No. of followers ofTwitter user doesn’t directly affect the No. of retweets.
Not very important to have a verified account to get retweeted.15
18. Summary
The use of Link-based ranking algorithms such
as Page Rank and HITS does promise us some
insights about concerningTwitter Users and
their significance.
These insights can be useful for Customer Care
/ Churn Management
18
19. FutureWork
Additional relationships to be considered.
(Conversational Reply, Pure Mentions)
Further validation of additional attributes.
(Verified,Tweet Count, Followers Count,
Following Count etc. )
Extend deeper intoTweet level analysis.
19