Building a Microblog Corpus for Search Result Diversification

Building a Microblog Corpus
for Search Result Diversification
AIRS 2013, Singapore, December 10

Ke Tao, Claudia Hauff, Geert-Jan Houben
Web Information Systems, TU Delft, the Netherlands

Delft
University of
Technology

Research Challenges
1. Diversification needed: Users are likely to use shorter
queries, which tend to be underspecified, to search on
microblog

2. Lack of Corpus for Diversification Study: How can
one build a microblog corpus for evaluating further study
on diversification?
Search
Result

tweets

query

Diversified
Result

diversification
strategy

diversity
judgment

Building a Microblog Corpus for Search Result Diversification

2

Methodology

Overview

1. Data Source
• How can we find a good representative Twitter dataset?

2. Topic Selection
• How do we select the search topics?

3. Tweets Pooling
• Which tweets are we going to annotate?

4. Diversity Annotation
• How do we annotate the tweets with diversity characteristics?

3

Methodology – Data source
• From where?
• Twitter sampling API  around 1% of whole Twitter streams

• Duration
• From February 1st to March 31st 2013
• Coincide with TREC 2013 Microblog Track

• Tools
• Twitter Public Stream Sampling Tools by @lintool
• Amazon EC2 in EU
TREC 2013 Microblog Guideline: https://github.com/lintool/twitter-tools/wiki/ TREC-2013-Track-Guidelines
Twitter Public Stream Sampling Tool: https://github.com/lintool/twitter-tools/wiki/Sampling-the-public-Twitter-stream


4

Methodology – Topic Selection

How do we select the search topics?
• Candidates in Wikipedia Current Events Portal
• Enough importance
• More than local interests

• Temporal Characteristics
• Evenly distributed during the period of 2-month
• Enables further analysis on temporal characteristics

• Selected
• 50 topics on trending news events
Wikipedia Current Events Portal: http://en.wikipedia.org/wiki/Portal: Current_events


5

Methodology – Tweets Pooling – 1/2

Maximize coverage & Minimize effort
• Challenge for adopting existing solution
• Lack of access to multiple retrieval systems

• Topic Expansion
• Manually created query for each topic
• Aim at maximum coverage of tweets that are relevant to the topic

• Duplicate Filtering
• Filter out the duplicate tweets (cosine similarity > 0.9)


6

Methodology – Tweets Pooling – 2/2

Topic Expansion Example

Hillary Clinton steps
down as United States
Secretary of State
Possible variety
of expressions


7

Methodology – Diversity Annotation

Annotation Efforts

• 500 tweets for each topic
• No identification of subtopics beforehand
• Tweets about general topic (=no added value) are judged non-relevant

• No further check on URL links  may be not available as time goes

• 50 topics split between 2 annotators
• Subjective process
• Later comparative results
• 3 topics dropped – e.g. not enough diversity / relevant documents


8

Topic Analysis

The Topics and Subtopics 1/2
All topics
Avg. #subtopics
Std. dev. #subtopics
Min. #subtopics
Max. #subtopics

9.27
3.88
2
21

Topics annotated by
Annotator 1 Annotator 2
8.59
9.88
5.11
2.14
2
6
21
13

On average, we found 9 subtopics per each topic.
The subjectivity of annotation is confirmed based on
the differences in the standard deviation of number
of subtopics per each topic between two annotators.

9

Topic Analysis

The Topics and Subtopics 2/2

The annotators on average spent 6.6 seconds to
annotate a tweet. Most of the tweets are assigned
with exactly one subtopic.

10

Topic Analysis

The relevance judgment 1/2
• Different diversity in topics
• 25 topics have less than 100 tweets with subtopics
• 6 topics have more than 350 tweets with subtopics

• Difference between 2 annotators
• On average, 96 tweets v.s. 181 tweets with subtopic assignment

Number of documents

500
400
300
RELEVANT
200

NONRELEVANT

100
0

Topics


11

Topic Analysis

The relevance judgment 2/2
• Temporal persistence
• Some topics are active during the entire timespan
• Northern Mali conflicts
• Syrian civil war

• Low to 24 hours for some topics
• BBC Twitter account hacked
• Eiffel Tower, evacuated due to bomb threat
Difference in days

60
50
40
30
20
10
0

Topics


12

Topic Analysis

Diversity Difficulty
• The difficulty to diversify the search results
• Ambiguity or Under-specification of topics
• Diverse content available in the corpus

• Golbus et al. proposed diversity difficulty measure dd
• dd > 0.9 = arbitrary ranked list is likely to cover all subtopics
• dd < 0.5 means hard to discover subtopics by an untuned retrieval system
All topics
Avg. diversity difficulty
Std. Dev. diversity difficulty

0.71
0.07

Topics annotated by
Annotator 1
Annotator 2
0.72
0.70
0.06
0.07

Golbus et al.: Increasing evaluation sensitivity to diversity. Information Retrieval (2013) 16


13

Topic Analysis

Diversity Difficulty
• The difficulty to diversify the search results
• Ambiguity or Under-specification of topics
• Diverse content available in the corpus

• Golbus et al. proposed diversity difficulty measure dd
• dd > 0.9 indicates a diverse query
• dd < 0.5 means hard to discover subtopics by an untuned retrieval system

• Difference between long-/short-term topics
• The topics with longer timespan (>50 days) are easier in diversity difficulty
(0.73 > 0.70)
Golbus et al.: Increasing evaluation sensitivity to diversity. Information Retrieval (2013) 16


14

Diversification by De-Duplicating – 1/6

Lower redudancy, but higher diversity?

• In previous work, we were motivated by the fact that
• 20% of search results are duplicate information in different extent

• Therefore, we proposed to remove the duplicates in order to
achieve lower redundancy in top-k results
• Implemented with a machine learning framework
• Make use of syntactical, semantic, and contextual features
• Eliminate the identified duplicates with lower rank in the search result

Whether it can achieve in higher diversity?
Tao et al.: Groundhog Day: Near-duplicate Detection on Twitter. In Proceedings of 22nd
International World Wide Web Conference.


15


Measures

• We adopts following measures:
• alpha-(n)DCG

• Precision-IA
• Subtopic-Recall
• Redundancy

Clarke et al.: Novelty and Diversity in Information Retrieval Evaluation. In Proceedings of
SIGIR, 2008.
Agrawal et al.: Diversifying Search Results. In Proceedings of WSDM, 2009.
Zhai et al.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic
Retrieval. In Proceedings of SIGIR, 2003.

16


Baseline and De-Duplicate Strategies
• Baseline Strategies

• Automatic run: using standard queries (no more than 3 terms)
• Filtered Auto: filter the duplicates out w.r.t. cosine similarity

• Manual Run: manually created complex queries with automatic filtering

• De-duplicate Strategies
• Sy = Syntactical, Se= Semantic, Co = Contextual
• Four strategies: Sy, SyCo, SySe, SySeCo


17


Overall comparison

Overall, the de-duplicate strategies did achieve in
lower redundancy. However, they didn’t achieve
in terms of higher diversity.

18


Influence of Annotator Subjectivity


19


Influence of Annotator Subjectivity

The same general trends for both annotators.
alpha-nDCG scores are higher for Annotator 2
 can be explained by on average more
documents judged as relevant by Annotator 2.


20


Influence of Temporal Persistence


21


Influence of Temporal Persistence

De-duplicate strategies can help for long-term
topics, because the vocabulary was richer
while only a small set of terms were used for
short-term topics.


22

Conclusions
• We have done:

• Created a microblog-based corpus for search result diversification
• Conducted comprehensive analysis and showed its suitability
• Confirmed considerable subjectivity among annotators, although the trends
w.r.t. the different evaluation measures were largely independent of
annotators

• We have made the corpus available via:
• http://wis.ewi.tudelft.nl/airs2013/

• What we will do:

• Apply the diversification approaches that have been shown to perform well
in the Web search setting.
• Propose the diversification approaches specifically designed for search on
microblogging platforms.

23

Thank you!
@wisdelft
http://ktao.nl

Ke Tao
@taubau


24

Building a Microblog Corpus for Search Result Diversification

Recommandé

Recommandé

Contenu connexe

Similaire à Building a Microblog Corpus for Search Result Diversification

Similaire à Building a Microblog Corpus for Search Result Diversification (20)

Dernier

Dernier (20)

Building a Microblog Corpus for Search Result Diversification

Notes de l'éditeur