The document summarizes the Rhea algorithm for adaptively sampling authoritative content from social activity streams. Rhea forms a network of authoritative users as it processes the stream and samples only content from the top-K authoritative users based on an auth-value measure. It addresses challenges of maintaining user information efficiently, ranking users, and filtering irrelevant content. Experimental results on Twitter and StackOverflow data show Rhea outperforms white-list baselines in terms of precision, recall, and ranking accuracy of the sampled documents.
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams
1. Rhea: Adaptively Sampling Authoritative
Content from Social Activity Streams
Panagiotis Liakos - Alexandros Ntoulas - Alex Delis
University of Athens, Greece
IEEE BigData 2017
December 11th-14th, 2017 - Boston, MA
4. Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
UoA Panagiotis Liakos Rhea-• Motivation 3/26
5. Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
UoA Panagiotis Liakos Rhea-• Motivation 3/26
6. Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
Our approach:
Sample the content published by authorities
UoA Panagiotis Liakos Rhea-• Motivation 3/26
7. Related Work
Social Activity Stream Sampling:
White-lists of users [GSB+12, WLP+12, GZB+13, ZBG+16].
Focus is mainly on Twitter.
Our approach is adaptive and does not rely on static white-lists.
Authoritative users in Online Social Networks:
Network attributes [ZAA07, JA07, ACD+08, PC11, BBC+13].
We focus on streams, not networks.
UoA Panagiotis Liakos Rhea-• Related Work 4/26
8. Contribution
We propose Rhea:
A sampling algorithm for authoritative content that forms a
network of authorities as it processes a social activity stream,
and samples only the activity of the top-K authoritative users.
We build on:
Network-based measures and their Our findings on the disadvantages
adaptation in a streaming setting of white-list approaches
We outperform contemporary approaches with regard to
precision, recall, and ranking accuracy!
UoA Panagiotis Liakos Rhea-• Contribution 5/26
10. Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
11. Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
12. Ranking the Authorities
z-score: Zhang, Ackerman and Adamic, WWW 2007
Builds on positive and negative predictors of expertise:
z(u) = a(u)−q(u)
√
a(u)+q(u)
where, a(u) is the number of questions u has answered
and q(u) is the number of questions u has asked.
UoA Panagiotis Liakos Rhea-• Network-based measures 8/26
13. Ranking the Authorities
We propose auth-value:
A measure for a wide range of social networking sites:
auth(u) = in(u)−out(u)
√
in(u)+out(u)
where, in(u) is the weighted in-degree of u in the network of authorities
and out(u) is her respective weighted out-degree.
UoA Panagiotis Liakos Rhea-• Network-based measures 9/26
17. Limitations of Static Lists of Authorities
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
Precision@K
K (authorities)
Sept. 2009 & Oct. 2009
Sept. 2009 & Nov. 2009
Sept. 2009 & Dec. 2009
We need an adaptive algorithm!
UoA Panagiotis Liakos Rhea-• White-lists 12/26
18. Rhea: “She who flows”
Museum of Fine Arts,
Boston
UoA Panagiotis Liakos Rhea-• Rhea 13/26
19. Rhea: Three Challenges
1 Maintaining user information
may be costly in terms of both memory & CPU
2 Ranking users
may require reckoning in multiple measures
3 Many elements we opt to include may be irrelevant
UoA Panagiotis Liakos Rhea-• Rhea 14/26
20. Maintaining User Information
Count-Min sketch:
+ct
+ct
+ct
+ct
h1
h2
hd
...
it d
w
count
Reducing the processing overhead through sampling:
We apply a Bernoulli sampling scheme [PJC+15].
UoA Panagiotis Liakos Rhea-• Rhea 15/26
21. Ranking Authorities
We need to know at any time the top-K users by auth(u):
Algorithm 1: put(Top-K-Heap, key, value)
input : A Top-K-Heap structure and a key associated with a value to be
inserted in the Top-K-Heap.
output : The updated Top-K-Heap.
1 begin
2 if Top-K-Heap.size() < K then
3 if Top-K-Heap.contains(key) then
4 Top-K-Heap.replace(key, value);
5 else
6 Top-K-Heap.push(key, value);
7 else
8 if Top-K-Heap.contains(key) then
9 Top-K-Heap.replace(key, value);
10 else if value > Top-K-Heap.peek().value() then
11 Top-K-Heap.pop();
12 Top-K-Heap.push(key, value);
13 return Top-K-Heap;
UoA Panagiotis Liakos Rhea-• Rhea 16/26
22. Filtering-out Non-relevant Activity
While processing the stream, we may deem as an authority
a user that temporarily appears to be one.
We lose in precision!
Post-processing step:
The sample is much smaller than the stream: ˆS S
We re-examine the elements of the sample and
filter-out the activity of users not in the Top-K-Heap
UoA Panagiotis Liakos Rhea-• Rhea 17/26
23. Rhea
Forming the network of
authorities
Sampling the stream
Removing irrelevant content
Algorithm 2: Rhea(S, K, p)
input : A stream S, a parameter K > 0 and a probability p ∈ (0, 1].
output : A set ˆS ⊂ S containing elements whose respective users are likely to
be among the top-K w.r.t. to the auth-value.
begin
T op-K-heap ← ∅;
CMSin ← ∅;
CMSout ← ∅;
foreach s ∈ S do
if random(0, 1] < p then
(in, out) ← extractIndicators(s.message) ;
CMSin[in]+ = 1 ;
CMSout[out]+ = 1 ;
authuser ←
CMSin[s.user]−CMSout[s.user]
CMSin[s.user]+CMSout[s.user]
;
if authuser > T op-K-heap.low() then
T op-K-heap.put(user, authuser);
ˆS.put(s);
foreach s ∈ ˆS do
if s.user /∈ T op-K-heap then
ˆS.remove(s);
return ˆS;
UoA Panagiotis Liakos Rhea-• Rhea 18/26
24. Experimental Evaluation
Dataset:
1 467 million tweets from 20 million users of Twitter
2 263, 540 answers to 83, 423 questions posted by 26, 752 users of
StackOverflow
Questions:
1 How does Rhea compare against white-list based sampling in
terms of F1-score?
2 Is Rhea able to assess the ranking relevance of the sampled
documents?
3 What is the impact of the parameters involved in the execution
of Rhea?
UoA Panagiotis Liakos Rhea-• Exeriments 19/26
27. Impact of Parameters
Varying the Value of Probability p:
Using a sample of 20% of S we achieve performance almost as
good as that of using S.
Using p = 0.2 instead of p = 1 greatly reduces processing time.
Removing Filtering Step:
Over 25 p.p. for K = 1, 000 and is never less than 10 p.p. for
any K examined.
UoA Panagiotis Liakos Rhea-• Exeriments 22/26
28. Conclusion
Rhea is the 1st adaptive algorithm for sampling
authoritative content from social activity streams.
We exposed the dynamic nature of the task.
We introduced a measure to identify authoritative users.
Rhea employs several techniques to achieve significantly
improved performance with regard to recall, precision, and
ranking accuracy.
UoA Panagiotis Liakos Rhea-• Conclusion 23/26
29. References I
[ACD+
08] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne.
Finding high-quality content in social media.
In Proc. of the Int. Conf. on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA,
February 11-12, 2008, pages 183–194, 2008.
[BBC+
13] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci.
Choosing the right crowd: expert finding in social networks.
In Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pages
637–648, 2013.
[GSB+
12] Saptarshi Ghosh, Naveen Kumar Sharma, Fabr´ıcio Benevenuto, Niloy Ganguly, and P. Krishna Gummadi.
Cognos: crowdsourcing search for topic experts in microblogs.
In The 35th Int. ACM SIGIR Conf. on research and development in Information Retrieval, SIGIR ’12,
Portland, OR, USA, August 12-16, 2012, pages 575–590, 2012.
[GZB+
13] Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Kumar Sharma, Niloy Ganguly,
and P. Krishna Gummadi.
On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream.
In 22nd ACM Int. Conf. on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA,
October 27 - November 1, 2013, pages 1739–1744, 2013.
[JA07] Pawel Jurczyk and Eugene Agichtein.
Discovering authorities in question answer communities by using link analysis.
In Proc. of the 16th ACM Conf. on Information and Knowledge Management, CIKM 2007, Lisbon,
Portugal, November 6-10, 2007, pages 919–922, 2007.
[PC11] Aditya Pal and Scott Counts.
Identifying topical authorities in microblogs.
In Proc. of the 4th International Conference on Web Search and Web Data Mining, WSDM 2011, Hong
Kong, China, February 9-12, 2011, pages 45–54, 2011.
UoA Panagiotis Liakos Rhea-• References 24/26
30. References II
[PJC+
15] Deepan Subrahmanian Palguna, Vikas Joshi, Venkatesan T. Chakaravarthy, Ravi Kothari, and L. Venkata
Subramaniam.
Analysis of sampling algorithms for twitter.
In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July
25-31, 2015, pages 967–973, 2015.
[WLP+
12] Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson, and Markus Strohmaier.
It’s not in their tweets: Modeling topical expertise of twitter users.
In 2012 Int. Conf. on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 Int. Conf. on Social
Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 91–100, 2012.
[ZAA07] Jun Zhang, Mark S. Ackerman, and Lada A. Adamic.
Expertise networks in online communities: structure and algorithms.
In Proc. of the 16th Int. Conf. on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12,
2007, pages 221–230, 2007.
[ZBG+
16] Muhammad Bilal Zafar, Parantapa Bhattacharya, Niloy Ganguly, Saptarshi Ghosh, and Krishna P.
Gummadi.
On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs.
In Proc. of the 19th ACM Conf. on Computer-Supported Cooperative Work & Social Computing, CSCW
2016, San Francisco, CA, USA, February 27 - March 2, 2016, pages 437–450, 2016.
UoA Panagiotis Liakos Rhea-• References 25/26
31. thank you!
for further details email me at:
p.liakos@di.uoa.gr
UoA Panagiotis Liakos Rhea-• Contact 26/26