SlideShare une entreprise Scribd logo
1  sur  22
Crowdsourcing a News Query Classification Dataset Richard McCreadie, Craig Macdonald & IadhOunis 0
Introduction ,[object Object], Binary classification task performed by Web search engines Up to 10% of queries may be news-related [Bar-Ilan et al, 2009] ,[object Object],News-Related News Results gunman Non-News-Related Web Search Engine User Web Search Results 1
Introduction But: ,[object Object]
 Workers can easily `game` the task
 News Queries change over time for  query in task  return Random(Yes,No) end loop 2 News-related? Sea Creature? Query:  Octopus? Work Cup Predications? How can we overcome these difficulties to create a high quality dataset for news query classification?
[object Object]
Dataset Construction Methodology    (4--14)
Research Questions and Setting          (15--17)
Experiments and Results                       (18--20)
Conclusions                                               (21)Talk Outline 3
Dataset Construction Methodology ,[object Object],Sample Queries from the MSN May 2006 Query Log Create Gold judgments to validate the workers Propose additional content to tackle the temporal nature of news queries and prototype interfaces to evaluate this content on a small test set Create the final labels using the best setting and interface Evaluate in terms of agreement Evaluate against `experts` 4
Dataset Construction Methodology ,[object Object], Create 2 query sets sampled from the MSN May 2006 Query-log Poisson Sampling  One for testing (testset) Fast crowdsourcing turn around time Very low cost  One for the final dataset (fullset) 10x queries  Only labelled once 5 Date       Time     Query  2006-05-01 00:00:08 What is May Day?   2006-05-08 14:43:42 protest in Puerto rico Testset Queries        : 91 Fullset Queries        : 1206
Dataset Construction Methodology ,[object Object],Gold Judgments (honey-pot) Small set (5%) of queries Catch out bad workers early in the task `Cherry-picked` unambiguous queries  Focus on news-related queries Multiple workers per query 3 workers  Majority result 6 Date       Time     Query                   Validation 2006-05-01 00:00:08 What is May Day?        No 2006-05-08 14:43:42 protest in Puerto rico  Yes
Dataset Construction Methodology ,[object Object], Workers need to know what the news stories of the time were . . .  But likely will not remember what the main stories during May 2006 ,[object Object], News Headlines  News Summaries  Web Search results ,[object Object], Use small testset to keep costs and turn-around time low  See which works the best 7
Interfaces : Basic What the workers need to do Clarify news-relatedness 8 Query and Date Binary labelling
Interfaces : Headline 12 news headlines from the New York Times  . . .  Will the workers bother to read these? 9
Interfaces : HeadlineInline 12 news headlines from the New York Times  . . .  10 Maybe headlines are not enough?
Interfaces : HeadlineSummary news headline news summary  . . .  Query: Tigers of Tamil? 11
Interfaces : LinkSupported Links to three major search engines  . . .  Triggers a search containing the query and its date Also get some additional feedback from workers 12
Dataset Construction Methodology ,[object Object],Agreement between the three workers per query  The more the workers agree, the more confident that we can be that our resulting majority label is correct   Compare with `expert’ (me) judgments  See how many of the queries that the workers judged news-related match the ground truth 13 Date       Time     Query                   Worker Expert 2006-05-05 07:31:23 abcnews                 Yes    No 2006-05-08 14:43:42 protest in Puerto rico  Yes    Yes
[object Object]
Dataset Construction Methodology    (4--14)

Contenu connexe

Similaire à Crowdsourcing a News Query Classification Dataset

Maxdiff webinar_10_19_10
 Maxdiff webinar_10_19_10 Maxdiff webinar_10_19_10
Maxdiff webinar_10_19_10QuestionPro
 
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...Scott Abel
 
London coi september 26th, 2011 - marshall sponder
London coi  september 26th, 2011 - marshall sponderLondon coi  september 26th, 2011 - marshall sponder
London coi september 26th, 2011 - marshall sponderMarshall Sponder
 
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...Optimizely
 
Advanced Analysis Presentation
Advanced Analysis PresentationAdvanced Analysis Presentation
Advanced Analysis PresentationSemphonic
 
Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scaleKen Krugler
 
Yelp Data Set Challenge (What drives restaurant ratings?)
Yelp Data Set Challenge (What drives restaurant ratings?)Yelp Data Set Challenge (What drives restaurant ratings?)
Yelp Data Set Challenge (What drives restaurant ratings?)Prashanth Raj
 
The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouJay (Jianqiang) Wang
 
Asset finance systems projects guide 101
Asset finance systems projects guide 101Asset finance systems projects guide 101
Asset finance systems projects guide 101David Pedreno
 
Search analytics what why how - By Otis Gospodnetic
 Search analytics what why how - By Otis Gospodnetic  Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic lucenerevolution
 
Benchmarking for facility professionals - Ifma foundation whitepaper
Benchmarking for facility professionals - Ifma foundation whitepaperBenchmarking for facility professionals - Ifma foundation whitepaper
Benchmarking for facility professionals - Ifma foundation whitepaperMuriel Walter
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarQuestionPro
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesQuestionPro
 
Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis GospodneticSearch analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodneticlucenerevolution
 
Into AB experiments
Into AB experimentsInto AB experiments
Into AB experimentsDeven
 
Presentation finding the perfect database
Presentation finding the perfect databasePresentation finding the perfect database
Presentation finding the perfect databaseTechSoup
 

Similaire à Crowdsourcing a News Query Classification Dataset (20)

Maxdiff webinar_10_19_10
 Maxdiff webinar_10_19_10 Maxdiff webinar_10_19_10
Maxdiff webinar_10_19_10
 
Amec Barcelona Summit Speech 2010
Amec Barcelona Summit Speech 2010Amec Barcelona Summit Speech 2010
Amec Barcelona Summit Speech 2010
 
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...
2023 Challenges and Opportunities Impacting Technical Documentation Team Capa...
 
London coi september 26th, 2011 - marshall sponder
London coi  september 26th, 2011 - marshall sponderLondon coi  september 26th, 2011 - marshall sponder
London coi september 26th, 2011 - marshall sponder
 
PQF Overview
PQF OverviewPQF Overview
PQF Overview
 
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...
Triple Your Experiment Velocity by Integrating Optimizely with Your Data Ware...
 
Advanced Analysis Presentation
Advanced Analysis PresentationAdvanced Analysis Presentation
Advanced Analysis Presentation
 
Ogx match bi
Ogx match biOgx match bi
Ogx match bi
 
Match OGX_BI
Match OGX_BIMatch OGX_BI
Match OGX_BI
 
Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scale
 
Yelp Data Set Challenge (What drives restaurant ratings?)
Yelp Data Set Challenge (What drives restaurant ratings?)Yelp Data Set Challenge (What drives restaurant ratings?)
Yelp Data Set Challenge (What drives restaurant ratings?)
 
The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in Kuaishou
 
Asset finance systems projects guide 101
Asset finance systems projects guide 101Asset finance systems projects guide 101
Asset finance systems projects guide 101
 
Search analytics what why how - By Otis Gospodnetic
 Search analytics what why how - By Otis Gospodnetic  Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic
 
Benchmarking for facility professionals - Ifma foundation whitepaper
Benchmarking for facility professionals - Ifma foundation whitepaperBenchmarking for facility professionals - Ifma foundation whitepaper
Benchmarking for facility professionals - Ifma foundation whitepaper
 
SurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff WebinarSurveyAnalytics MaxDiff Webinar
SurveyAnalytics MaxDiff Webinar
 
SurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar SlidesSurveyAnalytics MaxDiff Webinar Slides
SurveyAnalytics MaxDiff Webinar Slides
 
Search analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis GospodneticSearch analytics what why how - By Otis Gospodnetic
Search analytics what why how - By Otis Gospodnetic
 
Into AB experiments
Into AB experimentsInto AB experiments
Into AB experiments
 
Presentation finding the perfect database
Presentation finding the perfect databasePresentation finding the perfect database
Presentation finding the perfect database
 

Crowdsourcing a News Query Classification Dataset

  • 1. Crowdsourcing a News Query Classification Dataset Richard McCreadie, Craig Macdonald & IadhOunis 0
  • 2.
  • 3.
  • 4. Workers can easily `game` the task
  • 5. News Queries change over time for query in task return Random(Yes,No) end loop 2 News-related? Sea Creature? Query: Octopus? Work Cup Predications? How can we overcome these difficulties to create a high quality dataset for news query classification?
  • 6.
  • 8. Research Questions and Setting (15--17)
  • 10. Conclusions (21)Talk Outline 3
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Interfaces : Basic What the workers need to do Clarify news-relatedness 8 Query and Date Binary labelling
  • 16. Interfaces : Headline 12 news headlines from the New York Times . . . Will the workers bother to read these? 9
  • 17. Interfaces : HeadlineInline 12 news headlines from the New York Times . . . 10 Maybe headlines are not enough?
  • 18. Interfaces : HeadlineSummary news headline news summary . . . Query: Tigers of Tamil? 11
  • 19. Interfaces : LinkSupported Links to three major search engines . . . Triggers a search containing the query and its date Also get some additional feedback from workers 12
  • 20.
  • 21.
  • 23. Research Questions and Setting (15--17)
  • 25. Conclusions (21)Talk Outline 14
  • 26.
  • 27.
  • 29. Judgments per query : 3
  • 30.
  • 31.
  • 33. Research Questions and Setting (15--17)
  • 35. Conclusions (21)Talk Outline 17
  • 36.
  • 37. How is our Baseline?Validation is very important: 32% of judgments were rejected Basic Interface Accuracy : Combined Measure (assumes that the workers labelled non-news-related queries correctly) Recall : The % of all news-related queries that the workers labelled correctly As expected the baseline is fairly poor, i.e. Agreement between workers per query is low (25-50%) 20% of those were completed VERY quickly: Bots? Kfree : Kappa agreement assuming that workers would label randomly Precision : The % of queries labelled as news-related that agree with our ground truth . . . and new users Kfleiss : Kappa agreement assuming that workers will label according to class distribution Watch out for bursty judging
  • 38.
  • 39.
  • 40. High recall and agreement indicate that the labels are of high qualityPrecision . Workers finding other queries to be news-related Recall Workers got all of the news-related queries right! Agreement . Workers maybe learning the task? Majority of work done by 3 users testset fullset
  • 41.
  • 42. We are confident that our dataset is reliable since agreement is highBest Practices Online worker validation is paramount, catch out bots and lazy workers to improve agreement Provide workers with additional information to help improve labelling quality Workers can learn, running large single jobs may allow workers to become better at the task Questions? 21

Notes de l'éditeur

  1. Poisson sampling – literature says is representative
  2. At beginningOverriding hypothesis
  3. Too late$2
  4. Precision : the number of
  5. Worker overlap is only 1/3
  6. Precision : the number of
  7. More detail and link backwardsGraph missingHow confident in the test collection – looks reliable