3. Crowds or no crowds?
• Study different ways to crowdsource
entity typing using paid microtasks.
• Three workflows
– Free associations
– Validating the machine
– Exploring the DBpedia ontology
3
5. What to crowdsource (2)
• Entity typing (from a list of suggestions)
5
E - City
- SportsTeam
- Municipality
- PopulatedPlace
C
6. How to crowdsource: no suggestions
Workflow
Ask crowd
to suggest
classes
Take top k
Ask crowd
to vote
the best
match
Pros/cons
+ No biases
+ No pre-processing
– Vocabulary convergence
– Time and costs
– The more classifications the
better
– Two steps
6
7. How to crowdsource: with suggestions
Two options
• Generate a shortlist
– Automatically
• Show all available options
– As a tree
Pros/cons
+ Focused, cheap, fast
– Too many classes (685!),
see [Miller, 1956]
– Not the right classes
– Tool does not perform well
– Crowd is not familiar with
classes, see [Rosch et al.,
1976], [Tanaka & Taylor,
1991]
7
10. Experiments: Data
• Classified entities in popular
categories
• Test workflows, compare crowd
and machine performance
E1: Baseline,
120 entities
• Test the three workflows on data
that cannot be classified
automatically
E2:
Unclassified
entities, 12o
entities
• Fewer judgements
• Lower level of tool support
E3:
Unclassified
entities,
optimized, 120
entities
11. Experiments: Methods
• Adjusted precision metric to take into account broader and
narrower matches, as well as synonyms
• Gold standard (for E2 and E3)
– Two annotator, Cohen kappa of 0.7
– Conflicts resolved via small set of rules and discussions
11
12. Overall results
• Shortlists are easy & fast
• Freedom comes with a
price
• Working at the basic
level of abstraction
achieves greatest
precision
– Even when there is
too much choice
12
13. Other observations
• Unclassified entities might be unclassifiable
– Different entity summary
– Freetext or explorative workflow
• Popular classes are not enough
– Alternative approach to browse the taxonomy
• The basic level of abstraction in DBpedia is user-friendly
– But when given the freedom to choose, users suggest
more specific classes
– Domain-specific vocabulary is not welcome
13
14. Conclusions
• In knowledge engineering, microtask crowdsourcing has
focused on improving the results of automatic algorithms
• We know too little about those cases in which algorithms
fail
• No optimal workflow in sight
• The DBpedia ontology needs revision
14
15. Using microtasks to crowdsource DBpedia entity
classification: a study in workflow design
E Simperl, Q Bu, Y Li
Submitted to SWJ, 2015
Email: e.simperl@soton.ac.uk
Twitter: @esimperl
15