SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
CUbRIK SummerSchool2014 
CUbRIK Summer School 0 
Mining, Analyzing and Exploiting Community Feedback on the Web 
Sergiu Chelaru 
L3S Research Center, Hannover
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 1 
Community Feedback on the Web 
Comments: a way to communicate with users and/or communities
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 2 
Outline 
Comment-Centric Feedback 
Comment Ratings 
Polarized Content 
Controversial Comments 
Trolls 
Social Feedback 
Query Result Characteristics 
Social Features 
Learning to Rank using Social Features 
Community Sentiment in Web Queries 
Analysis of Sentiment in Web Queries 
Detecting Query Sentiment 
Two Application Scenarios 
Summary and Contributions
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 3 
Comment Centric Feedback 
YouTube dataset 
756 Google Zeitgeist keywords 
50 videos, metadata, 500 comments 
67k videos, 6 mil comments 
Yahoo! News dataset 
Yahoo! RSS Feed, Sept-Dec 2011 
27k news stories 
5.4 mil comments 
Descriptive statistics for the 
YouTube and Yahoo! News corpora.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 4 
Comment-Centric Feedback 
Distribution of number of comments for videos in 
YouTube and news stories in Yahoo! News.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 5 
Comment Ratings 
Distribution of comment ratings for (a) YouTube, and (b) Yahoo! News. 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 6 
Term Analysis of Rated Comments 
Top-50 terms according to their MI values for accepted comments (with high 
comment ratings) vs. not accepted comments (with low comment ratings).
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 7 
Term Analysis of Rated Comments 
Examples of 
comments 
belonging to 
the categories 
“accepted”.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 8 
Term Analysis of Rated Comments 
Examples of comments belonging to the categories “unaccepted”.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 9 
Sentiment Analysis of Rated Comments 
Does language and sentiment used by the community have an influence on comment ratings? 
Three disjoint partitions: 
5Neg: comments with rating score r<= -5 
0Dist: comments with rating score r = 0 
5Pos: comments with rating score r>=5 
Comparison of mean senti-values for comments with different kinds of community ratings in (a) YouTube and (b) Yahoo! News.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 10 
Ratings and Polarized Content 
Variance of Comment Ratings as Indicator for Polarizing Videos
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 11 
Ratings and Polarized Content 
Variance of Comment Ratings as Indicator for Polarizing Topics
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 13 
Predicting Comment Ratings 
Classify comments into accepted by the communityand not accepted 
AC_POS 
AC_NEG 
THRESH-0 
Text processing: stopwords removal, stemming 
 푐1,푙1,…, 푐푛,푙푛 
Ratingthresholds for “accepted” vs “not accepted” 
Different amounts for training set size T
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 14 
Predicting Comment Ratings 
Comment rating classification: BEPs for different training set sizes T and 
different rating thresholds.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 15 
Predicting Comment Ratings 
Precision-recall curves for 
comment rating prediction.
CUbRIK SummerSchool2014 
2-4/07/2014 
CUbRIK Summer School 
16 
Controversial Comments 
In many platforms 
“For some reason, a lot of you thing that rich people pay 
NO taxes? They pay taxes even though 50% of Americans 
do not. What Obama wants to do is RAISE their taxes. 
That’s not fair. Let’s make sure everyone pays taxes and 
politicians use tax money in a sensible way before we 
raise taxes on a few.” 
10 
15 
comment_rating= #likes -#dislikes
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 17 
Controversial Comments 
Examples of comments belonging to the categories “controversial” and 
“non-controversial”.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 18 
Term Analysis Controversial of Comments 
bank: criticized because of their role in the financial crisis, comments are approved by a large majority of the users. 
Top-20 terms according to their MI values for controversial vs. non-controversial comments.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 19 
Analysis of likesand dislikes 
Comment Approval Ratio 
Φ푐= 푙푐 푙푐+푑푐 
푙푐(푑푐):number of likes (dislikes) for a comment 푐 
Controversy Interval 
0.5−δ퐶≤Φ푐≤0.5+δ퐶,δ퐶=0.1 
Non-controversy Interval 
0.5−δ푁퐶≤Φ푐≤0.5+δ푁퐶,δ푁퐶ε[0.1,0.2,0.3]
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 20 
Analysis of likesand dislikes 
(a) Distribution of number of comments per comment approval intervals for distinct thresholds for the number of received ratings. (b) Controversy interval vs. accepted (positive) and not accepted (negative) intervals. 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 21 
Predicting Controversial Comments 
Co 
BEPs for controversial comment prediction. 
Note that: 
•BEPs relatively low 
•Results implementable 
•Trading recall for precision leads to applicable results: P = 0.859 for R = 0.1 
Precision-recall curve for the classification of controversial comments forδ푁퐶= 0.4
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 22 
Trolls on the Social Web 
Trolls: “posting disruptive, false or offensive comments to fool and provoke other users” 
Study comment rating feedback for troll/non-troll 
users 
Study methods for automatically detecting the presence of trolls 
Slashdot No More Trolls: 200 trolls, 200 non trolls, 24 comments / user 
YouTube dataset
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 23 
Trolls on the Social Web 
Johny1 
Mexican, Puerto Rican, Cuban ... whocares? 
I love that this Negro says/ sings: "If I WERE a boy." 
I would feel awful about admitting being a Republican. 
I hope Britney Slut will die of Swine flu. 
I love that this Negro says/ sings: "If I WERE a boy." 
All I want is that she doesn't rape valuable classical songs. Even a diva like this Beyoncé doesn't have the right to commit such a crime. 
Johny2 
you obviously have no idea what you are talking about. 
Shut up you douchebag. 
Moron.Ifthe religious groups did not subject their will on to everyone, there would not even need to be an atheist title. No one would care. 
Perhaps people with speak issues should be euthanized. 
Kindathe point there, dipshit. 
You are quite the ignorant fuckwit. They do look like crap, you have no idea what you're talking about. Most likely don't have the device either.Moron. 
Examples of troll users in YouTube (Johny1) and Slashdot (Johny2).
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 24 
Term Analysis of Troll Comments 
Top-20 terms according to their MI values for troll vs. non-troll comments.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 25 
Trolls and Community Ratings 
(a) 
(b) 
Comment rating distribution for comments from troll users and non- troll users in (a) YouTube and, (b) Slashdot.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 26 
Content-based Troll Prediction 
Linear SVM, 2-fold cross validation 
BEP: 0.68 for YouTube, 0.74 for Slashdot
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 27 
Outline 
Comment-Centric Feedback 
Comment Ratings 
Polarized Content 
Controversial Comments 
Trolls 
Social Feedback 
Query Result Characteristics 
Social Features 
Learning to Rank using Social Features 
Community Sentiment in Web Queries 
Analysis of Sentiment in Web Queries 
Detecting Query Sentiment 
Two Application Scenarios 
Summary and Contributions
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 28 
Social Feedback
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 29 
Contribution 
What are the characteristics of the YouTube query results with respect to the social features? 
How effective is each individual feature for ranking the videos for a given query? 
Can social features help improving the video retrieval performance in a learning to rank (LETOR) framework?
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 31 
Data Collection 
Query Sets 
1,4k popularqueries (푄푝) 
1,3k tailqueries (푄푡) 
Video Sets 
푉푝:132k videos retrieved for 푄푝 
푉푡:63k videos retrieved for 푄푡
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 32 
Query Result Characteristics 
Category distribution of (a) popular, and (b) tail queries 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 33 
Query Result Characteristics 
Number of results (reported by YouTube) for (a) popular, and (b) tail queries 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 34 
Query Result Characteristics 
Avg. no. of (a) views, (b) likes, (c) dislikes and (d) comments vs. video rank in the query results
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 35 
Data Annotation 
100 queries, 100 videos/query =>10k videos
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 36 
Basic and Social Features 
The list of all the basic and social features (F) employed in our work.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 37 
Effectiveness of Features 
Fraction of queries for which a given feature yields the ranking with the highest NDCG@10 for (a) popular, and (b) tail queries 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 38 
Video Retrieval Framework 
7 LETOR algorithms 
Feature Selection 
GAS 
MMR 
(q, F, r) 
5-fold cross validation 
NDCG@10, NDCG@5 
Train 7 Letor Models 
Run Prediction Models 
Build kdimensional 
Query-Video Pairs 
NDCG 
Top k Feature Selection 
Train Queries+Videos 
Test Queries+Videos 
for k ε {1,...,# features}
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 41 
LETOR Results for Popular+Tail 
Average NDCG@10 scores for LETOR algorithms using the basic and best-k features obtained with the GAS and MMR strategies for the popular and tail query sets (for bold cases, differences from the baseline are statistically significant). For GAS and MMR, we also denote the number of selected features (k) in parentheses.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 42 
Outline 
Comment-Centric Feedback 
Comment Ratings 
Polarized Content 
Controversial Comments 
Trolls 
Social Feedback 
Query Result Characteristics 
Social Features 
Learning to Rank using Social Features 
Community Sentiment in Web Queries 
Analysis of Sentiment in Web Queries 
Detecting Query Sentiment 
Two Application Scenarios 
Summary and Contributions
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 43 
Contribution 
Analysis of sentiment in Web queries 
Study the applicability of state-of-the-art sentiment analysis methods for detecting the sentiment of the queries 
Employ query sentiment detectors in two use cases, query recommendation and controversial topic discovery
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 44 
What is Sentiment Analysis 
1 
Examples of positive (top) and negative (bottom) opinionated reviews for the movie Madagascar 3:Europe’s most wanted.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 45 
Data Collection 
50 controversial topics from procon.org and Wikipedia (e.g.,abortion, iphone, marijuana) 
AOL query log 
31,053 queries 
7,651 annotated queries 
Templates for gathering queries (along with the number of 
manually annotated queries per template)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 46 
Sentiment in Web Queries 
Queries and sentiment categories for the topic “George Bush”.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 47 
Sentiment in Query Results 
Traces of bias in top-k query results 
60 queries, 600 titles, 600 snippets 
Sentiment distribution of (a) query result titles, and (b) query result snippets for the queries from each sentiment class. 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 48 
Post-Retrieval Analysis 
Post retrieval behaviour of the user 
MSN log, 5 topics, 1.5k queries, 79 opinionated, 
222 clicked pages 
Sentiment distribution of the clicked results for (a) positive queries, and (b) negative queries. 
(a) 
(b)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 49 
Detecting Query Sentiment 
Study state-of-the-art methods to detect the sentiment class of a query 
Feature vectors 
Query text, top-10 result titles and snippets 
TF-IDF weights, stemming, stopwords, negations 
Classification aproaches 
Simple logistic regression (SLR) 
Naive Bayes (mNB) 
3 SVM types 
3 types of one vs all(binnary classifiers) 
50/50 split for training/testing
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 50 
Detecting Query Sentiment 
Classification accuracy and AUC for the subjective vs. all classifiers trained with four different representations of the queries (QAllstands for QTextTitleSnippet).
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 51 
Detecting Query Sentiment 
Precision-recall curves and BEPs for (a) subjective vs. all, (b) positive vs. all, and (c) negative vs. all classifiers. 
(a) 
(b) 
(c)
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 52 
Recommender Methods 
Improvment of recommandations by analyzing the sentiment of the suggested query 
Our approach: opinionated suggestions 
For a query q, generate query suggestions having the same sentiment class as q 
Baseline: search engine suggestions 
Issue qto a SE (Nov-2011), collect suggestedand relatedqueries 
Evaluation: compare the opinionated suggestions vs 
the SE suggestions
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 53 
Recommender Methods 
User study 
Suggested query: rellevant/irrelevant/undecided 
15 topics, 30 seed queries, 600 annotated suggestions 
CS researchers, AMT workers 
Query recommendation performance based on (a) in-house 
annotations, and (b) AMT annotations.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 54 
Recommender Methods 
Search engine’s suggestions (provided as “related queries” and “auto- completions”, the latter are shown in italics) vs. opinionated suggestions for the query “economy is really bad”.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 55 
Controversial Topic Discovery 
Classify sentiment in queries, infer controversial topics 
A toy example illustrating controversial topic detection: the procedure 
will output only “zen” as being controversial, as it yields very high variance in query sentiment scores and filter “zendaya”, as its queries have less variance.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 56 
Controversial Topic Discovery 
Topics ranked with respect to the variance in sentiment 
scores of their queries. 
Wicca: a modern pagan religion 
cult, good, right 
fake, evil, stupid
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 57 
Summary and Contributions 
Comment-Centric Feedback 
In-depth analysis on 11mil comments 
Studied dependencies between comment ratings and textual content 
Explored the applicability of ML technieques to detect accepted and controversial comments 
Studied users exhibiting offensive behaviour 
Social Feedback 
Analysed query/query result characteristics for popular and tail queries 
Effectiveness of individual social features for LETOR 
Learning to Rank using Social Features
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 58 
Summary and Contributions 
Community-Sentiment in Web queries 
Studies Sentiment in Web search queries 
Methods able to detect the sentiment class of a query 
Application 1: Query recommandation method 
Application 2: Controversial topic discovery method
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 59 
Publications 
Chelaru, S., Altingovde, I. S., Siersdorfer, S., and Nejdl, W. Analyzing, detecting, and exploiting sentiment in web queries. ACM Transactions on the Web 8, 1 (Dec. 2013), 6:1–6:28 
Chelaru, S., Altingovde, I. S., and Siersdorfer, S. Analyzingthe polarity of opinionated queries. In ECIR ’12, Springer-Verlag, pp. 463–467 
Siersdorfer, S., Chelaru, S., Nejdl, W., and San Pedro, J. How useful are your comments?: analyzingand predicting youtubecomments and comment ratings. In WWW ’10, ACM, pp. 891–90 
Siersdorfer, S., Chelaru, S., San Pedro, J., Altingovde, I. S., and Nejdl,W. Analyzingand mining comments and comment ratings on the social web. ACM Transactions on the Web8, (June 2014), 17:1-17:39 
Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. S. Can social features help learning to rank youtubevideos? WISE ’12, Springer-Verlag, pp. 552–566 
Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. How useful is social feedback for learning to rank youtubevideos? World Wide Web Journal (2013), 1–29 
Chelaru, S., Herder, E., DjafariNaini, K., and Siehndel, P. Recognizing skill networks and their specific communication and connection practices. In HT ’14 (Accepted Paper), ACM 
Demartini, G., Siersdorfer, S., Chelaru, S., and Nejdl, W. Analyzingpolitical trends in the blogosphereICWSM ’11.
CUbRIK SummerSchool2014 
2-4/07/2014 CUbRIK Summer School 60 
Thanks 
Questions?

Contenu connexe

En vedette

Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and ReviewsYelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Tharindu Mathew
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
Jigsaw Academy
 

En vedette (11)

Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and ReviewsYelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
Yelp Data Challenge - Discovering Latent Factors using Ratings and Reviews
 
Snapchat Group Snaps Proposal
Snapchat Group Snaps ProposalSnapchat Group Snaps Proposal
Snapchat Group Snaps Proposal
 
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
"Managing User-Generated Reviews" - Jed Nachman (Yelp) - 2009 AIM Conference
 
Apache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done betterApache Giraph: Large-scale graph processing done better
Apache Giraph: Large-scale graph processing done better
 
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds... Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parinds...
Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parinds...
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Yelp Project
Yelp ProjectYelp Project
Yelp Project
 
Yelp final
Yelp finalYelp final
Yelp final
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 

Similaire à CUbRIK research on social aspects

Criticisms and reviews research
Criticisms and reviews researchCriticisms and reviews research
Criticisms and reviews research
Jamescooperabel1
 
Real-Time Community Question Answering: Exploring Content Recommendation and ...
Real-Time Community Question Answering: Exploring Content Recommendation and ...Real-Time Community Question Answering: Exploring Content Recommendation and ...
Real-Time Community Question Answering: Exploring Content Recommendation and ...
Jinho Choi
 
A2 g324 evaluation questions
A2 g324 evaluation questions A2 g324 evaluation questions
A2 g324 evaluation questions
Jenna9
 
thinkLA Trends Breakfast 2013 - Christy tanner Presentation
thinkLA Trends Breakfast 2013 - Christy tanner PresentationthinkLA Trends Breakfast 2013 - Christy tanner Presentation
thinkLA Trends Breakfast 2013 - Christy tanner Presentation
thinkLA
 
Template Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docxTemplate Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docx
rhetttrevannion
 

Similaire à CUbRIK research on social aspects (20)

Zhenfei Feng: The Impact of Social Influence on Users’ Ratings of Movies
Zhenfei Feng: The Impact of Social Influence on Users’ Ratings of MoviesZhenfei Feng: The Impact of Social Influence on Users’ Ratings of Movies
Zhenfei Feng: The Impact of Social Influence on Users’ Ratings of Movies
 
User Research. Do or Do Not? How to design better products by understanding u...
User Research. Do or Do Not? How to design better products by understanding u...User Research. Do or Do Not? How to design better products by understanding u...
User Research. Do or Do Not? How to design better products by understanding u...
 
Collective Intelligence Meets the Political Agenda
Collective Intelligence Meets the Political AgendaCollective Intelligence Meets the Political Agenda
Collective Intelligence Meets the Political Agenda
 
Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014Turrin rec syschallenge_presentation_@recsys2014
Turrin rec syschallenge_presentation_@recsys2014
 
Bottl final presentation
Bottl  final presentationBottl  final presentation
Bottl final presentation
 
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
 
Measuring Social Media: Assessing Your Impact
Measuring Social Media: Assessing Your ImpactMeasuring Social Media: Assessing Your Impact
Measuring Social Media: Assessing Your Impact
 
CUTGroup 7 EveryBlock iPhone App Final Results
CUTGroup 7 EveryBlock iPhone App Final ResultsCUTGroup 7 EveryBlock iPhone App Final Results
CUTGroup 7 EveryBlock iPhone App Final Results
 
Criticisms and reviews research
Criticisms and reviews researchCriticisms and reviews research
Criticisms and reviews research
 
You Tube Pranks: Schadenfreude and the Scary Maze
You Tube Pranks: Schadenfreude and the Scary MazeYou Tube Pranks: Schadenfreude and the Scary Maze
You Tube Pranks: Schadenfreude and the Scary Maze
 
Real-Time Community Question Answering: Exploring Content Recommendation and ...
Real-Time Community Question Answering: Exploring Content Recommendation and ...Real-Time Community Question Answering: Exploring Content Recommendation and ...
Real-Time Community Question Answering: Exploring Content Recommendation and ...
 
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
 
A2 g324 evaluation questions
A2 g324 evaluation questions A2 g324 evaluation questions
A2 g324 evaluation questions
 
thinkLA Trends Breakfast 2013 - Christy tanner Presentation
thinkLA Trends Breakfast 2013 - Christy tanner PresentationthinkLA Trends Breakfast 2013 - Christy tanner Presentation
thinkLA Trends Breakfast 2013 - Christy tanner Presentation
 
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
 
Template Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docxTemplate Leading Mathematical Discussions Performance-Based.docx
Template Leading Mathematical Discussions Performance-Based.docx
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
8 25-2014 daily slides
8 25-2014 daily  slides8 25-2014 daily  slides
8 25-2014 daily slides
 
Formation and Learning Analytics?
Formation and Learning Analytics?Formation and Learning Analytics?
Formation and Learning Analytics?
 
Entertainment in the Era of the Selfie - Edelman 2014
Entertainment in the Era of the Selfie - Edelman 2014Entertainment in the Era of the Selfie - Edelman 2014
Entertainment in the Era of the Selfie - Edelman 2014
 

Plus de CUbRIK Project

Plus de CUbRIK Project (20)

Matching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a PurposeMatching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a Purpose
 
Humanist machine interaction with histoGraph
Humanist machine interaction with histoGraphHumanist machine interaction with histoGraph
Humanist machine interaction with histoGraph
 
histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013
 
histoGraph for historians
histoGraph for historianshistoGraph for historians
histoGraph for historians
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
SMILA in CUbRIK
SMILA in CUbRIKSMILA in CUbRIK
SMILA in CUbRIK
 
Building a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphBuilding a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraph
 
The CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetThe CUbRIK histoGraph Factsheet
The CUbRIK histoGraph Factsheet
 
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
 
CUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual Interface
 
Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?
 
CUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@Qualinet
 
CUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approach
 
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
 
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
 
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
 
Semantic schema for geonames
Semantic schema for geonamesSemantic schema for geonames
Semantic schema for geonames
 
Exploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak DetectionExploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak Detection
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

CUbRIK research on social aspects

  • 1. CUbRIK SummerSchool2014 CUbRIK Summer School 0 Mining, Analyzing and Exploiting Community Feedback on the Web Sergiu Chelaru L3S Research Center, Hannover
  • 2. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 1 Community Feedback on the Web Comments: a way to communicate with users and/or communities
  • 3. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 2 Outline Comment-Centric Feedback Comment Ratings Polarized Content Controversial Comments Trolls Social Feedback Query Result Characteristics Social Features Learning to Rank using Social Features Community Sentiment in Web Queries Analysis of Sentiment in Web Queries Detecting Query Sentiment Two Application Scenarios Summary and Contributions
  • 4. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 3 Comment Centric Feedback YouTube dataset 756 Google Zeitgeist keywords 50 videos, metadata, 500 comments 67k videos, 6 mil comments Yahoo! News dataset Yahoo! RSS Feed, Sept-Dec 2011 27k news stories 5.4 mil comments Descriptive statistics for the YouTube and Yahoo! News corpora.
  • 5. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 4 Comment-Centric Feedback Distribution of number of comments for videos in YouTube and news stories in Yahoo! News.
  • 6. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 5 Comment Ratings Distribution of comment ratings for (a) YouTube, and (b) Yahoo! News. (a) (b)
  • 7. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 6 Term Analysis of Rated Comments Top-50 terms according to their MI values for accepted comments (with high comment ratings) vs. not accepted comments (with low comment ratings).
  • 8. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 7 Term Analysis of Rated Comments Examples of comments belonging to the categories “accepted”.
  • 9. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 8 Term Analysis of Rated Comments Examples of comments belonging to the categories “unaccepted”.
  • 10. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 9 Sentiment Analysis of Rated Comments Does language and sentiment used by the community have an influence on comment ratings? Three disjoint partitions: 5Neg: comments with rating score r<= -5 0Dist: comments with rating score r = 0 5Pos: comments with rating score r>=5 Comparison of mean senti-values for comments with different kinds of community ratings in (a) YouTube and (b) Yahoo! News.
  • 11. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 10 Ratings and Polarized Content Variance of Comment Ratings as Indicator for Polarizing Videos
  • 12. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 11 Ratings and Polarized Content Variance of Comment Ratings as Indicator for Polarizing Topics
  • 13. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 13 Predicting Comment Ratings Classify comments into accepted by the communityand not accepted AC_POS AC_NEG THRESH-0 Text processing: stopwords removal, stemming  푐1,푙1,…, 푐푛,푙푛 Ratingthresholds for “accepted” vs “not accepted” Different amounts for training set size T
  • 14. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 14 Predicting Comment Ratings Comment rating classification: BEPs for different training set sizes T and different rating thresholds.
  • 15. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 15 Predicting Comment Ratings Precision-recall curves for comment rating prediction.
  • 16. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 16 Controversial Comments In many platforms “For some reason, a lot of you thing that rich people pay NO taxes? They pay taxes even though 50% of Americans do not. What Obama wants to do is RAISE their taxes. That’s not fair. Let’s make sure everyone pays taxes and politicians use tax money in a sensible way before we raise taxes on a few.” 10 15 comment_rating= #likes -#dislikes
  • 17. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 17 Controversial Comments Examples of comments belonging to the categories “controversial” and “non-controversial”.
  • 18. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 18 Term Analysis Controversial of Comments bank: criticized because of their role in the financial crisis, comments are approved by a large majority of the users. Top-20 terms according to their MI values for controversial vs. non-controversial comments.
  • 19. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 19 Analysis of likesand dislikes Comment Approval Ratio Φ푐= 푙푐 푙푐+푑푐 푙푐(푑푐):number of likes (dislikes) for a comment 푐 Controversy Interval 0.5−δ퐶≤Φ푐≤0.5+δ퐶,δ퐶=0.1 Non-controversy Interval 0.5−δ푁퐶≤Φ푐≤0.5+δ푁퐶,δ푁퐶ε[0.1,0.2,0.3]
  • 20. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 20 Analysis of likesand dislikes (a) Distribution of number of comments per comment approval intervals for distinct thresholds for the number of received ratings. (b) Controversy interval vs. accepted (positive) and not accepted (negative) intervals. (a) (b)
  • 21. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 21 Predicting Controversial Comments Co BEPs for controversial comment prediction. Note that: •BEPs relatively low •Results implementable •Trading recall for precision leads to applicable results: P = 0.859 for R = 0.1 Precision-recall curve for the classification of controversial comments forδ푁퐶= 0.4
  • 22. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 22 Trolls on the Social Web Trolls: “posting disruptive, false or offensive comments to fool and provoke other users” Study comment rating feedback for troll/non-troll users Study methods for automatically detecting the presence of trolls Slashdot No More Trolls: 200 trolls, 200 non trolls, 24 comments / user YouTube dataset
  • 23. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 23 Trolls on the Social Web Johny1 Mexican, Puerto Rican, Cuban ... whocares? I love that this Negro says/ sings: "If I WERE a boy." I would feel awful about admitting being a Republican. I hope Britney Slut will die of Swine flu. I love that this Negro says/ sings: "If I WERE a boy." All I want is that she doesn't rape valuable classical songs. Even a diva like this Beyoncé doesn't have the right to commit such a crime. Johny2 you obviously have no idea what you are talking about. Shut up you douchebag. Moron.Ifthe religious groups did not subject their will on to everyone, there would not even need to be an atheist title. No one would care. Perhaps people with speak issues should be euthanized. Kindathe point there, dipshit. You are quite the ignorant fuckwit. They do look like crap, you have no idea what you're talking about. Most likely don't have the device either.Moron. Examples of troll users in YouTube (Johny1) and Slashdot (Johny2).
  • 24. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 24 Term Analysis of Troll Comments Top-20 terms according to their MI values for troll vs. non-troll comments.
  • 25. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 25 Trolls and Community Ratings (a) (b) Comment rating distribution for comments from troll users and non- troll users in (a) YouTube and, (b) Slashdot.
  • 26. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 26 Content-based Troll Prediction Linear SVM, 2-fold cross validation BEP: 0.68 for YouTube, 0.74 for Slashdot
  • 27. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 27 Outline Comment-Centric Feedback Comment Ratings Polarized Content Controversial Comments Trolls Social Feedback Query Result Characteristics Social Features Learning to Rank using Social Features Community Sentiment in Web Queries Analysis of Sentiment in Web Queries Detecting Query Sentiment Two Application Scenarios Summary and Contributions
  • 28. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 28 Social Feedback
  • 29. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 29 Contribution What are the characteristics of the YouTube query results with respect to the social features? How effective is each individual feature for ranking the videos for a given query? Can social features help improving the video retrieval performance in a learning to rank (LETOR) framework?
  • 30. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 31 Data Collection Query Sets 1,4k popularqueries (푄푝) 1,3k tailqueries (푄푡) Video Sets 푉푝:132k videos retrieved for 푄푝 푉푡:63k videos retrieved for 푄푡
  • 31. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 32 Query Result Characteristics Category distribution of (a) popular, and (b) tail queries (a) (b)
  • 32. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 33 Query Result Characteristics Number of results (reported by YouTube) for (a) popular, and (b) tail queries (a) (b)
  • 33. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 34 Query Result Characteristics Avg. no. of (a) views, (b) likes, (c) dislikes and (d) comments vs. video rank in the query results
  • 34. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 35 Data Annotation 100 queries, 100 videos/query =>10k videos
  • 35. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 36 Basic and Social Features The list of all the basic and social features (F) employed in our work.
  • 36. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 37 Effectiveness of Features Fraction of queries for which a given feature yields the ranking with the highest NDCG@10 for (a) popular, and (b) tail queries (a) (b)
  • 37. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 38 Video Retrieval Framework 7 LETOR algorithms Feature Selection GAS MMR (q, F, r) 5-fold cross validation NDCG@10, NDCG@5 Train 7 Letor Models Run Prediction Models Build kdimensional Query-Video Pairs NDCG Top k Feature Selection Train Queries+Videos Test Queries+Videos for k ε {1,...,# features}
  • 38. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 41 LETOR Results for Popular+Tail Average NDCG@10 scores for LETOR algorithms using the basic and best-k features obtained with the GAS and MMR strategies for the popular and tail query sets (for bold cases, differences from the baseline are statistically significant). For GAS and MMR, we also denote the number of selected features (k) in parentheses.
  • 39. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 42 Outline Comment-Centric Feedback Comment Ratings Polarized Content Controversial Comments Trolls Social Feedback Query Result Characteristics Social Features Learning to Rank using Social Features Community Sentiment in Web Queries Analysis of Sentiment in Web Queries Detecting Query Sentiment Two Application Scenarios Summary and Contributions
  • 40. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 43 Contribution Analysis of sentiment in Web queries Study the applicability of state-of-the-art sentiment analysis methods for detecting the sentiment of the queries Employ query sentiment detectors in two use cases, query recommendation and controversial topic discovery
  • 41. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 44 What is Sentiment Analysis 1 Examples of positive (top) and negative (bottom) opinionated reviews for the movie Madagascar 3:Europe’s most wanted.
  • 42. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 45 Data Collection 50 controversial topics from procon.org and Wikipedia (e.g.,abortion, iphone, marijuana) AOL query log 31,053 queries 7,651 annotated queries Templates for gathering queries (along with the number of manually annotated queries per template)
  • 43. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 46 Sentiment in Web Queries Queries and sentiment categories for the topic “George Bush”.
  • 44. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 47 Sentiment in Query Results Traces of bias in top-k query results 60 queries, 600 titles, 600 snippets Sentiment distribution of (a) query result titles, and (b) query result snippets for the queries from each sentiment class. (a) (b)
  • 45. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 48 Post-Retrieval Analysis Post retrieval behaviour of the user MSN log, 5 topics, 1.5k queries, 79 opinionated, 222 clicked pages Sentiment distribution of the clicked results for (a) positive queries, and (b) negative queries. (a) (b)
  • 46. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 49 Detecting Query Sentiment Study state-of-the-art methods to detect the sentiment class of a query Feature vectors Query text, top-10 result titles and snippets TF-IDF weights, stemming, stopwords, negations Classification aproaches Simple logistic regression (SLR) Naive Bayes (mNB) 3 SVM types 3 types of one vs all(binnary classifiers) 50/50 split for training/testing
  • 47. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 50 Detecting Query Sentiment Classification accuracy and AUC for the subjective vs. all classifiers trained with four different representations of the queries (QAllstands for QTextTitleSnippet).
  • 48. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 51 Detecting Query Sentiment Precision-recall curves and BEPs for (a) subjective vs. all, (b) positive vs. all, and (c) negative vs. all classifiers. (a) (b) (c)
  • 49. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 52 Recommender Methods Improvment of recommandations by analyzing the sentiment of the suggested query Our approach: opinionated suggestions For a query q, generate query suggestions having the same sentiment class as q Baseline: search engine suggestions Issue qto a SE (Nov-2011), collect suggestedand relatedqueries Evaluation: compare the opinionated suggestions vs the SE suggestions
  • 50. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 53 Recommender Methods User study Suggested query: rellevant/irrelevant/undecided 15 topics, 30 seed queries, 600 annotated suggestions CS researchers, AMT workers Query recommendation performance based on (a) in-house annotations, and (b) AMT annotations.
  • 51. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 54 Recommender Methods Search engine’s suggestions (provided as “related queries” and “auto- completions”, the latter are shown in italics) vs. opinionated suggestions for the query “economy is really bad”.
  • 52. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 55 Controversial Topic Discovery Classify sentiment in queries, infer controversial topics A toy example illustrating controversial topic detection: the procedure will output only “zen” as being controversial, as it yields very high variance in query sentiment scores and filter “zendaya”, as its queries have less variance.
  • 53. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 56 Controversial Topic Discovery Topics ranked with respect to the variance in sentiment scores of their queries. Wicca: a modern pagan religion cult, good, right fake, evil, stupid
  • 54. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 57 Summary and Contributions Comment-Centric Feedback In-depth analysis on 11mil comments Studied dependencies between comment ratings and textual content Explored the applicability of ML technieques to detect accepted and controversial comments Studied users exhibiting offensive behaviour Social Feedback Analysed query/query result characteristics for popular and tail queries Effectiveness of individual social features for LETOR Learning to Rank using Social Features
  • 55. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 58 Summary and Contributions Community-Sentiment in Web queries Studies Sentiment in Web search queries Methods able to detect the sentiment class of a query Application 1: Query recommandation method Application 2: Controversial topic discovery method
  • 56. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 59 Publications Chelaru, S., Altingovde, I. S., Siersdorfer, S., and Nejdl, W. Analyzing, detecting, and exploiting sentiment in web queries. ACM Transactions on the Web 8, 1 (Dec. 2013), 6:1–6:28 Chelaru, S., Altingovde, I. S., and Siersdorfer, S. Analyzingthe polarity of opinionated queries. In ECIR ’12, Springer-Verlag, pp. 463–467 Siersdorfer, S., Chelaru, S., Nejdl, W., and San Pedro, J. How useful are your comments?: analyzingand predicting youtubecomments and comment ratings. In WWW ’10, ACM, pp. 891–90 Siersdorfer, S., Chelaru, S., San Pedro, J., Altingovde, I. S., and Nejdl,W. Analyzingand mining comments and comment ratings on the social web. ACM Transactions on the Web8, (June 2014), 17:1-17:39 Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. S. Can social features help learning to rank youtubevideos? WISE ’12, Springer-Verlag, pp. 552–566 Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. How useful is social feedback for learning to rank youtubevideos? World Wide Web Journal (2013), 1–29 Chelaru, S., Herder, E., DjafariNaini, K., and Siehndel, P. Recognizing skill networks and their specific communication and connection practices. In HT ’14 (Accepted Paper), ACM Demartini, G., Siersdorfer, S., Chelaru, S., and Nejdl, W. Analyzingpolitical trends in the blogosphereICWSM ’11.
  • 57. CUbRIK SummerSchool2014 2-4/07/2014 CUbRIK Summer School 60 Thanks Questions?