SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Rhea: Adaptively Sampling Authoritative
Content from Social Activity Streams
Panagiotis Liakos - Alexandros Ntoulas - Alex Delis
University of Athens, Greece
IEEE BigData 2017
December 11th-14th, 2017 - Boston, MA
UoA Panagiotis Liakos Rhea-• Motivation 2/26
500 million tweets
sent each day!
UoA Panagiotis Liakos Rhea-• Motivation 2/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Motivation
Mining social activity in real-time is valuable for numerous
applications:
opinion mining
content recommendation
emerging news detection
Processing the full activity stream of a social network is prohibitive:
storage
computational cost
Not all content is useful:
90% of tweets is conversational or spam!
Workaround: take a sample of the social
activity and use it to feed into applications!
Our approach:
Sample the content published by authorities
UoA Panagiotis Liakos Rhea-• Motivation 3/26
Related Work
Social Activity Stream Sampling:
White-lists of users [GSB+12, WLP+12, GZB+13, ZBG+16].
Focus is mainly on Twitter.
Our approach is adaptive and does not rely on static white-lists.
Authoritative users in Online Social Networks:
Network attributes [ZAA07, JA07, ACD+08, PC11, BBC+13].
We focus on streams, not networks.
UoA Panagiotis Liakos Rhea-• Related Work 4/26
Contribution
We propose Rhea:
A sampling algorithm for authoritative content that forms a
network of authorities as it processes a social activity stream,
and samples only the activity of the top-K authoritative users.
We build on:
Network-based measures and their Our findings on the disadvantages
adaptation in a streaming setting of white-list approaches
We outperform contemporary approaches with regard to
precision, recall, and ranking accuracy!
UoA Panagiotis Liakos Rhea-• Contribution 5/26
Network-based measures
UoA Panagiotis Liakos Rhea-• Network-based measures 6/26
Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
Network of Authorities from Social Activity
UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
Ranking the Authorities
z-score: Zhang, Ackerman and Adamic, WWW 2007
Builds on positive and negative predictors of expertise:
z(u) = a(u)−q(u)
√
a(u)+q(u)
where, a(u) is the number of questions u has answered
and q(u) is the number of questions u has asked.
UoA Panagiotis Liakos Rhea-• Network-based measures 8/26
Ranking the Authorities
We propose auth-value:
A measure for a wide range of social networking sites:
auth(u) = in(u)−out(u)
√
in(u)+out(u)
where, in(u) is the weighted in-degree of u in the network of authorities
and out(u) is her respective weighted out-degree.
UoA Panagiotis Liakos Rhea-• Network-based measures 9/26
Our findings on
White-List approaches
UoA Panagiotis Liakos Rhea-• White-lists 10/26
Limitations of Static Lists of Authorities
Rank
October 2009 November 2009 December 2009
user u auth(u) user u auth(u) user u auth(u)
1 justinbieber 393.885 justinbieber 448.815 justinbieber 433.185
2 donniewahlberg 358.286 donniewahlberg 249.988 nickjonas 249.558
3 tweetmeme 263.103 revrunwisdom 242.807 revrunwisdom 222.571
4 revrunwisdom 237.964 tweetmeme 195.379 donniewahlberg 202.996
5 mashable 229.650 addthis 186.282 tweetmeme 183.603
6 addthis 212.325 ddlovato 181.720 jonasbrothers 182.882
7 ddlovato 204.910 luansantanaevc 167.514 addthis 181.403
8 jordanknight 191.045 jordanknight 167.197 omgfacts 154.136
9 jonasbrothers 175.054 jonasbrothers 165.520 mashable 153.616
10 lilduval 174.616 mashable 164.496 johncmayer 147.241
User rankings vary across different months
White-lists can be unstable and
quickly become out-of-date
UoA Panagiotis Liakos Rhea-• White-lists 11/26
Limitations of Static Lists of Authorities
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
Precision@K
K (authorities)
Sept. 2009 & Oct. 2009
Sept. 2009 & Nov. 2009
Sept. 2009 & Dec. 2009
UoA Panagiotis Liakos Rhea-• White-lists 12/26
Limitations of Static Lists of Authorities
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
Precision@K
K (authorities)
Sept. 2009 & Oct. 2009
Sept. 2009 & Nov. 2009
Sept. 2009 & Dec. 2009
We need an adaptive algorithm!
UoA Panagiotis Liakos Rhea-• White-lists 12/26
Rhea: “She who flows”
Museum of Fine Arts,
Boston
UoA Panagiotis Liakos Rhea-• Rhea 13/26
Rhea: Three Challenges
1 Maintaining user information
may be costly in terms of both memory & CPU
2 Ranking users
may require reckoning in multiple measures
3 Many elements we opt to include may be irrelevant
UoA Panagiotis Liakos Rhea-• Rhea 14/26
Maintaining User Information
Count-Min sketch:
+ct
+ct
+ct
+ct
h1
h2
hd
...
it d
w
count
Reducing the processing overhead through sampling:
We apply a Bernoulli sampling scheme [PJC+15].
UoA Panagiotis Liakos Rhea-• Rhea 15/26
Ranking Authorities
We need to know at any time the top-K users by auth(u):
Algorithm 1: put(Top-K-Heap, key, value)
input : A Top-K-Heap structure and a key associated with a value to be
inserted in the Top-K-Heap.
output : The updated Top-K-Heap.
1 begin
2 if Top-K-Heap.size() < K then
3 if Top-K-Heap.contains(key) then
4 Top-K-Heap.replace(key, value);
5 else
6 Top-K-Heap.push(key, value);
7 else
8 if Top-K-Heap.contains(key) then
9 Top-K-Heap.replace(key, value);
10 else if value > Top-K-Heap.peek().value() then
11 Top-K-Heap.pop();
12 Top-K-Heap.push(key, value);
13 return Top-K-Heap;
UoA Panagiotis Liakos Rhea-• Rhea 16/26
Filtering-out Non-relevant Activity
While processing the stream, we may deem as an authority
a user that temporarily appears to be one.
We lose in precision!
Post-processing step:
The sample is much smaller than the stream: ˆS S
We re-examine the elements of the sample and
filter-out the activity of users not in the Top-K-Heap
UoA Panagiotis Liakos Rhea-• Rhea 17/26
Rhea
Forming the network of
authorities
Sampling the stream
Removing irrelevant content
Algorithm 2: Rhea(S, K, p)
input : A stream S, a parameter K > 0 and a probability p ∈ (0, 1].
output : A set ˆS ⊂ S containing elements whose respective users are likely to
be among the top-K w.r.t. to the auth-value.
begin
T op-K-heap ← ∅;
CMSin ← ∅;
CMSout ← ∅;
foreach s ∈ S do
if random(0, 1] < p then
(in, out) ← extractIndicators(s.message) ;
CMSin[in]+ = 1 ;
CMSout[out]+ = 1 ;
authuser ←
CMSin[s.user]−CMSout[s.user]
CMSin[s.user]+CMSout[s.user]
;
if authuser > T op-K-heap.low() then
T op-K-heap.put(user, authuser);
ˆS.put(s);
foreach s ∈ ˆS do
if s.user /∈ T op-K-heap then
ˆS.remove(s);
return ˆS;
UoA Panagiotis Liakos Rhea-• Rhea 18/26
Experimental Evaluation
Dataset:
1 467 million tweets from 20 million users of Twitter
2 263, 540 answers to 83, 423 questions posted by 26, 752 users of
StackOverflow
Questions:
1 How does Rhea compare against white-list based sampling in
terms of F1-score?
2 Is Rhea able to assess the ranking relevance of the sampled
documents?
3 What is the impact of the parameters involved in the execution
of Rhea?
UoA Panagiotis Liakos Rhea-• Exeriments 19/26
F1-score
0
0.2
0.4
0.6
0.8
1
0 250 500 750 1000
F1-score
K (authorities)
Rhea (T)
WhiteList (T)
0
0.2
0.4
0.6
0.8
1
0 250 500 750 1000
F1-score
K (authorities)
Rhea (SO)
WhiteList (SO)
UoA Panagiotis Liakos Rhea-• Exeriments 20/26
Normalized Discounted Cumulative Gain
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
NDCG
K (authorities)
Rhea (T)
WhiteList (T)
0.4
0.5
0.6
0.7
0.8
0.9
1
0 250 500 750 1000
NDCG
K (authorities)
Rhea (SO)
WhiteList (SO)
UoA Panagiotis Liakos Rhea-• Exeriments 21/26
Impact of Parameters
Varying the Value of Probability p:
Using a sample of 20% of S we achieve performance almost as
good as that of using S.
Using p = 0.2 instead of p = 1 greatly reduces processing time.
Removing Filtering Step:
Over 25 p.p. for K = 1, 000 and is never less than 10 p.p. for
any K examined.
UoA Panagiotis Liakos Rhea-• Exeriments 22/26
Conclusion
Rhea is the 1st adaptive algorithm for sampling
authoritative content from social activity streams.
We exposed the dynamic nature of the task.
We introduced a measure to identify authoritative users.
Rhea employs several techniques to achieve significantly
improved performance with regard to recall, precision, and
ranking accuracy.
UoA Panagiotis Liakos Rhea-• Conclusion 23/26
References I
[ACD+
08] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne.
Finding high-quality content in social media.
In Proc. of the Int. Conf. on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA,
February 11-12, 2008, pages 183–194, 2008.
[BBC+
13] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci.
Choosing the right crowd: expert finding in social networks.
In Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pages
637–648, 2013.
[GSB+
12] Saptarshi Ghosh, Naveen Kumar Sharma, Fabr´ıcio Benevenuto, Niloy Ganguly, and P. Krishna Gummadi.
Cognos: crowdsourcing search for topic experts in microblogs.
In The 35th Int. ACM SIGIR Conf. on research and development in Information Retrieval, SIGIR ’12,
Portland, OR, USA, August 12-16, 2012, pages 575–590, 2012.
[GZB+
13] Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Kumar Sharma, Niloy Ganguly,
and P. Krishna Gummadi.
On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream.
In 22nd ACM Int. Conf. on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA,
October 27 - November 1, 2013, pages 1739–1744, 2013.
[JA07] Pawel Jurczyk and Eugene Agichtein.
Discovering authorities in question answer communities by using link analysis.
In Proc. of the 16th ACM Conf. on Information and Knowledge Management, CIKM 2007, Lisbon,
Portugal, November 6-10, 2007, pages 919–922, 2007.
[PC11] Aditya Pal and Scott Counts.
Identifying topical authorities in microblogs.
In Proc. of the 4th International Conference on Web Search and Web Data Mining, WSDM 2011, Hong
Kong, China, February 9-12, 2011, pages 45–54, 2011.
UoA Panagiotis Liakos Rhea-• References 24/26
References II
[PJC+
15] Deepan Subrahmanian Palguna, Vikas Joshi, Venkatesan T. Chakaravarthy, Ravi Kothari, and L. Venkata
Subramaniam.
Analysis of sampling algorithms for twitter.
In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July
25-31, 2015, pages 967–973, 2015.
[WLP+
12] Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson, and Markus Strohmaier.
It’s not in their tweets: Modeling topical expertise of twitter users.
In 2012 Int. Conf. on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 Int. Conf. on Social
Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 91–100, 2012.
[ZAA07] Jun Zhang, Mark S. Ackerman, and Lada A. Adamic.
Expertise networks in online communities: structure and algorithms.
In Proc. of the 16th Int. Conf. on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12,
2007, pages 221–230, 2007.
[ZBG+
16] Muhammad Bilal Zafar, Parantapa Bhattacharya, Niloy Ganguly, Saptarshi Ghosh, and Krishna P.
Gummadi.
On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs.
In Proc. of the 19th ACM Conf. on Computer-Supported Cooperative Work & Social Computing, CSCW
2016, San Francisco, CA, USA, February 27 - March 2, 2016, pages 437–450, 2016.
UoA Panagiotis Liakos Rhea-• References 25/26
thank you!
for further details email me at:
p.liakos@di.uoa.gr
UoA Panagiotis Liakos Rhea-• Contact 26/26

Contenu connexe

Similaire à Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesApache StreamPipes
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataVito Ostuni
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...Adel Sabour
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Softwarebpupadhyaya
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23Dan Boutin
 
HIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsHIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsYhal Htet Aung
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1Rudolf Husar
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsDenis Parra Santander
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
The NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleThe NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleAmazon Web Services
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 

Similaire à Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams (20)

Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipes
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open DataTop-N Recommendations from Implicit Feedback leveraging Linked Open Data
Top-N Recommendations from Implicit Feedback leveraging Linked Open Data
 
cikm14
cikm14cikm14
cikm14
 
E05312426
E05312426E05312426
E05312426
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
حلقة تكنولوجية 11 بحث علمى بعنوان A Systematic Mapping Study for Big Data Str...
 
EvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source SoftwareEvalOSS : A Framework to Evaluate Open Source Software
EvalOSS : A Framework to Evaluate Open Source Software
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23
 
HIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex InteractionsHIT3328 - Chapter04 - Complex Interactions
HIT3328 - Chapter04 - Complex Interactions
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1061211 Agu Aq Datasystem1
061211 Agu Aq Datasystem1
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender Systems
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
The NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at ScaleThe NOAA Big Data Project: Public-Private Partnerships at Scale
The NOAA Big Data Project: Public-Private Partnerships at Scale
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 

Dernier

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Dernier (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams

  • 1. Rhea: Adaptively Sampling Authoritative Content from Social Activity Streams Panagiotis Liakos - Alexandros Ntoulas - Alex Delis University of Athens, Greece IEEE BigData 2017 December 11th-14th, 2017 - Boston, MA
  • 2. UoA Panagiotis Liakos Rhea-• Motivation 2/26
  • 3. 500 million tweets sent each day! UoA Panagiotis Liakos Rhea-• Motivation 2/26
  • 4. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 5. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost Not all content is useful: 90% of tweets is conversational or spam! Workaround: take a sample of the social activity and use it to feed into applications! UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 6. Motivation Mining social activity in real-time is valuable for numerous applications: opinion mining content recommendation emerging news detection Processing the full activity stream of a social network is prohibitive: storage computational cost Not all content is useful: 90% of tweets is conversational or spam! Workaround: take a sample of the social activity and use it to feed into applications! Our approach: Sample the content published by authorities UoA Panagiotis Liakos Rhea-• Motivation 3/26
  • 7. Related Work Social Activity Stream Sampling: White-lists of users [GSB+12, WLP+12, GZB+13, ZBG+16]. Focus is mainly on Twitter. Our approach is adaptive and does not rely on static white-lists. Authoritative users in Online Social Networks: Network attributes [ZAA07, JA07, ACD+08, PC11, BBC+13]. We focus on streams, not networks. UoA Panagiotis Liakos Rhea-• Related Work 4/26
  • 8. Contribution We propose Rhea: A sampling algorithm for authoritative content that forms a network of authorities as it processes a social activity stream, and samples only the activity of the top-K authoritative users. We build on: Network-based measures and their Our findings on the disadvantages adaptation in a streaming setting of white-list approaches We outperform contemporary approaches with regard to precision, recall, and ranking accuracy! UoA Panagiotis Liakos Rhea-• Contribution 5/26
  • 9. Network-based measures UoA Panagiotis Liakos Rhea-• Network-based measures 6/26
  • 10. Network of Authorities from Social Activity UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
  • 11. Network of Authorities from Social Activity UoA Panagiotis Liakos Rhea-• Network-based measures 7/26
  • 12. Ranking the Authorities z-score: Zhang, Ackerman and Adamic, WWW 2007 Builds on positive and negative predictors of expertise: z(u) = a(u)−q(u) √ a(u)+q(u) where, a(u) is the number of questions u has answered and q(u) is the number of questions u has asked. UoA Panagiotis Liakos Rhea-• Network-based measures 8/26
  • 13. Ranking the Authorities We propose auth-value: A measure for a wide range of social networking sites: auth(u) = in(u)−out(u) √ in(u)+out(u) where, in(u) is the weighted in-degree of u in the network of authorities and out(u) is her respective weighted out-degree. UoA Panagiotis Liakos Rhea-• Network-based measures 9/26
  • 14. Our findings on White-List approaches UoA Panagiotis Liakos Rhea-• White-lists 10/26
  • 15. Limitations of Static Lists of Authorities Rank October 2009 November 2009 December 2009 user u auth(u) user u auth(u) user u auth(u) 1 justinbieber 393.885 justinbieber 448.815 justinbieber 433.185 2 donniewahlberg 358.286 donniewahlberg 249.988 nickjonas 249.558 3 tweetmeme 263.103 revrunwisdom 242.807 revrunwisdom 222.571 4 revrunwisdom 237.964 tweetmeme 195.379 donniewahlberg 202.996 5 mashable 229.650 addthis 186.282 tweetmeme 183.603 6 addthis 212.325 ddlovato 181.720 jonasbrothers 182.882 7 ddlovato 204.910 luansantanaevc 167.514 addthis 181.403 8 jordanknight 191.045 jordanknight 167.197 omgfacts 154.136 9 jonasbrothers 175.054 jonasbrothers 165.520 mashable 153.616 10 lilduval 174.616 mashable 164.496 johncmayer 147.241 User rankings vary across different months White-lists can be unstable and quickly become out-of-date UoA Panagiotis Liakos Rhea-• White-lists 11/26
  • 16. Limitations of Static Lists of Authorities 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 Precision@K K (authorities) Sept. 2009 & Oct. 2009 Sept. 2009 & Nov. 2009 Sept. 2009 & Dec. 2009 UoA Panagiotis Liakos Rhea-• White-lists 12/26
  • 17. Limitations of Static Lists of Authorities 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 Precision@K K (authorities) Sept. 2009 & Oct. 2009 Sept. 2009 & Nov. 2009 Sept. 2009 & Dec. 2009 We need an adaptive algorithm! UoA Panagiotis Liakos Rhea-• White-lists 12/26
  • 18. Rhea: “She who flows” Museum of Fine Arts, Boston UoA Panagiotis Liakos Rhea-• Rhea 13/26
  • 19. Rhea: Three Challenges 1 Maintaining user information may be costly in terms of both memory & CPU 2 Ranking users may require reckoning in multiple measures 3 Many elements we opt to include may be irrelevant UoA Panagiotis Liakos Rhea-• Rhea 14/26
  • 20. Maintaining User Information Count-Min sketch: +ct +ct +ct +ct h1 h2 hd ... it d w count Reducing the processing overhead through sampling: We apply a Bernoulli sampling scheme [PJC+15]. UoA Panagiotis Liakos Rhea-• Rhea 15/26
  • 21. Ranking Authorities We need to know at any time the top-K users by auth(u): Algorithm 1: put(Top-K-Heap, key, value) input : A Top-K-Heap structure and a key associated with a value to be inserted in the Top-K-Heap. output : The updated Top-K-Heap. 1 begin 2 if Top-K-Heap.size() < K then 3 if Top-K-Heap.contains(key) then 4 Top-K-Heap.replace(key, value); 5 else 6 Top-K-Heap.push(key, value); 7 else 8 if Top-K-Heap.contains(key) then 9 Top-K-Heap.replace(key, value); 10 else if value > Top-K-Heap.peek().value() then 11 Top-K-Heap.pop(); 12 Top-K-Heap.push(key, value); 13 return Top-K-Heap; UoA Panagiotis Liakos Rhea-• Rhea 16/26
  • 22. Filtering-out Non-relevant Activity While processing the stream, we may deem as an authority a user that temporarily appears to be one. We lose in precision! Post-processing step: The sample is much smaller than the stream: ˆS S We re-examine the elements of the sample and filter-out the activity of users not in the Top-K-Heap UoA Panagiotis Liakos Rhea-• Rhea 17/26
  • 23. Rhea Forming the network of authorities Sampling the stream Removing irrelevant content Algorithm 2: Rhea(S, K, p) input : A stream S, a parameter K > 0 and a probability p ∈ (0, 1]. output : A set ˆS ⊂ S containing elements whose respective users are likely to be among the top-K w.r.t. to the auth-value. begin T op-K-heap ← ∅; CMSin ← ∅; CMSout ← ∅; foreach s ∈ S do if random(0, 1] < p then (in, out) ← extractIndicators(s.message) ; CMSin[in]+ = 1 ; CMSout[out]+ = 1 ; authuser ← CMSin[s.user]−CMSout[s.user] CMSin[s.user]+CMSout[s.user] ; if authuser > T op-K-heap.low() then T op-K-heap.put(user, authuser); ˆS.put(s); foreach s ∈ ˆS do if s.user /∈ T op-K-heap then ˆS.remove(s); return ˆS; UoA Panagiotis Liakos Rhea-• Rhea 18/26
  • 24. Experimental Evaluation Dataset: 1 467 million tweets from 20 million users of Twitter 2 263, 540 answers to 83, 423 questions posted by 26, 752 users of StackOverflow Questions: 1 How does Rhea compare against white-list based sampling in terms of F1-score? 2 Is Rhea able to assess the ranking relevance of the sampled documents? 3 What is the impact of the parameters involved in the execution of Rhea? UoA Panagiotis Liakos Rhea-• Exeriments 19/26
  • 25. F1-score 0 0.2 0.4 0.6 0.8 1 0 250 500 750 1000 F1-score K (authorities) Rhea (T) WhiteList (T) 0 0.2 0.4 0.6 0.8 1 0 250 500 750 1000 F1-score K (authorities) Rhea (SO) WhiteList (SO) UoA Panagiotis Liakos Rhea-• Exeriments 20/26
  • 26. Normalized Discounted Cumulative Gain 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 NDCG K (authorities) Rhea (T) WhiteList (T) 0.4 0.5 0.6 0.7 0.8 0.9 1 0 250 500 750 1000 NDCG K (authorities) Rhea (SO) WhiteList (SO) UoA Panagiotis Liakos Rhea-• Exeriments 21/26
  • 27. Impact of Parameters Varying the Value of Probability p: Using a sample of 20% of S we achieve performance almost as good as that of using S. Using p = 0.2 instead of p = 1 greatly reduces processing time. Removing Filtering Step: Over 25 p.p. for K = 1, 000 and is never less than 10 p.p. for any K examined. UoA Panagiotis Liakos Rhea-• Exeriments 22/26
  • 28. Conclusion Rhea is the 1st adaptive algorithm for sampling authoritative content from social activity streams. We exposed the dynamic nature of the task. We introduced a measure to identify authoritative users. Rhea employs several techniques to achieve significantly improved performance with regard to recall, precision, and ranking accuracy. UoA Panagiotis Liakos Rhea-• Conclusion 23/26
  • 29. References I [ACD+ 08] Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and Gilad Mishne. Finding high-quality content in social media. In Proc. of the Int. Conf. on Web Search and Web Data Mining, WSDM 2008, Palo Alto, California, USA, February 11-12, 2008, pages 183–194, 2008. [BBC+ 13] Alessandro Bozzon, Marco Brambilla, Stefano Ceri, Matteo Silvestri, and Giuliano Vesci. Choosing the right crowd: expert finding in social networks. In Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, pages 637–648, 2013. [GSB+ 12] Saptarshi Ghosh, Naveen Kumar Sharma, Fabr´ıcio Benevenuto, Niloy Ganguly, and P. Krishna Gummadi. Cognos: crowdsourcing search for topic experts in microblogs. In The 35th Int. ACM SIGIR Conf. on research and development in Information Retrieval, SIGIR ’12, Portland, OR, USA, August 12-16, 2012, pages 575–590, 2012. [GZB+ 13] Saptarshi Ghosh, Muhammad Bilal Zafar, Parantapa Bhattacharya, Naveen Kumar Sharma, Niloy Ganguly, and P. Krishna Gummadi. On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream. In 22nd ACM Int. Conf. on Information and Knowledge Management, CIKM’13, San Francisco, CA, USA, October 27 - November 1, 2013, pages 1739–1744, 2013. [JA07] Pawel Jurczyk and Eugene Agichtein. Discovering authorities in question answer communities by using link analysis. In Proc. of the 16th ACM Conf. on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007, pages 919–922, 2007. [PC11] Aditya Pal and Scott Counts. Identifying topical authorities in microblogs. In Proc. of the 4th International Conference on Web Search and Web Data Mining, WSDM 2011, Hong Kong, China, February 9-12, 2011, pages 45–54, 2011. UoA Panagiotis Liakos Rhea-• References 24/26
  • 30. References II [PJC+ 15] Deepan Subrahmanian Palguna, Vikas Joshi, Venkatesan T. Chakaravarthy, Ravi Kothari, and L. Venkata Subramaniam. Analysis of sampling algorithms for twitter. In Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 967–973, 2015. [WLP+ 12] Claudia Wagner, Vera Liao, Peter Pirolli, Les Nelson, and Markus Strohmaier. It’s not in their tweets: Modeling topical expertise of twitter users. In 2012 Int. Conf. on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 Int. Conf. on Social Computing, SocialCom 2012, Amsterdam, Netherlands, September 3-5, 2012, pages 91–100, 2012. [ZAA07] Jun Zhang, Mark S. Ackerman, and Lada A. Adamic. Expertise networks in online communities: structure and algorithms. In Proc. of the 16th Int. Conf. on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8-12, 2007, pages 221–230, 2007. [ZBG+ 16] Muhammad Bilal Zafar, Parantapa Bhattacharya, Niloy Ganguly, Saptarshi Ghosh, and Krishna P. Gummadi. On the wisdom of experts vs. crowds: Discovering trustworthy topical news in microblogs. In Proc. of the 19th ACM Conf. on Computer-Supported Cooperative Work & Social Computing, CSCW 2016, San Francisco, CA, USA, February 27 - March 2, 2016, pages 437–450, 2016. UoA Panagiotis Liakos Rhea-• References 25/26
  • 31. thank you! for further details email me at: p.liakos@di.uoa.gr UoA Panagiotis Liakos Rhea-• Contact 26/26