Crowdsourcing beyond Mechanical Turk: Building Crowdmining Services for Your Own Research
1. Crowdsourcing beyond …
Building Crowdmining Services for
Your Own Research
Kuan-Ta Chen
Institute of Information Science
Academia Sinica
CrowdKDD’12 Aug 12, 2012
2. What I’m going to talk
Crowdsourcing?
Crowdsourcing + Data Mining Research?
Common Fallacies of CS4DM Research
Pomics: A Crowdmining Service
Conclusion
3. Crowdsourcing
= Crowd + Outsourcing
“soliciting solutions via open calls
to large-scale communities”
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 3
4. A more formal definition
“Crowdsourcing is the act of taking a job traditionally
performed by a designated agent (usually an employee)
and outsourcing it to an undefined, generally large
group of people in the form of an open call.” [1]
[1] Howe, Jeff. Crowdsourcing: A Definition, http://crowdsourcing.typepad.com/
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 4
5. What Can
Crowdsourcing Do?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 5
16. Perspectives for 3D Objects
Thi Phuong Nghiem, Axel Carlier, Geraldine Morin, and Vincent Charvillat,
"Enhancing online 3D products through crowdsourcing," ACM CrowdMM'12.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 18
17. Web Site Classifier
12 USD / hour
Panos Ipeirotis, “Crowdsourcing using Mechanical Turk: Quality Management and
Scalability,” Invited Talk at CSDM 2011.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 19
18. Photographers’ Intention
to support a task?
to capture a bad feeling?
to preserve a good feeling?
to recall later on?
to publish it online?
to show it to friends and
family?
Mathias Lux, Mario Taschwer, and Oge Marques, “A Closer Look at Photographers’
Intentions: a Test Dataset,” ACM CrowdMM’12.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 20
19. Linguistic Affective Judgement
Affective response (Snow et al. 2008)
“Closing and cancellations
top advice on flu outbreak”
USD 0.4 to label 20 headlines (140 labels)
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 22
20. A Lot More Examples
Document relevance evaluation
Alonso et al. (2008)
Document rating collection
Kittur et al. (2008)
Noun compound paraphrasing
Nakov (2008)
Person name resolution
Su et al. (2007)
And so on...
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 24
21. THE COMMON FALLACIES
-- EXPERIENCES FROM CROWDMM’12
Thanks to CrowdMM’12 co-organizers: Wei-Tsang
Ooi, Martha Larson, and Wei-Ta Chu; also thanks
to “Crowdsourcing for Multimedia” SI co-guest-
editors Paul Bennent and Matt Lease.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 25
22. Common Fallacies #1
Crowdsourcing is NOT JUST
conducting user studies
Crowd is uncontrollable with
tasks performed in uncontrolled conditions
How to manage the crowd?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 26
23. Common Fallacies #2
Crowdsourcing is NOT JUST
analyzing user-generated content
Cope with the noise in UGC rather than
only the information.
How to manage the imperfectness &
diversity in UGC?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 27
24. Common Fallacies #2
Crowdsourcing is NOT JUST
analyzing user-generated content
Put the task element in the loop
Re-purposing the creation of UGC as
your own microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 28
25. Common Fallacies #3
Crowdsourcing is NOT JUST
posting tasks on Mechanical Turk
Explicit Crowdsourcing Implicit Crowdsourcing
Piggyback Crowdsourcing
Doan et al, "Crowdsourcing systems on the World-Wide Web," CACM, vol 54, no 4, 2011.
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 29
27. Crowdsourcing for Data Mining:
Issues
Purposes Methodologies
Annotation Recruiting
(ground-truth generation) Incentives
Evaluation Task Design
Retrieval Workflow
Human-in-the-loop Learning from crowd
computation Quality control
Cheat detection
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 31
28. The Era of Too Many Photos
People today use pictures to write down their daily experience
(with the prevalence of digital cameras)
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 32
36. Photo Comics – Baby Born
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 40
37. Photo Comics – Birthday Party
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 41
38. Photo Comics – Daily Fun
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 42
39. Media Comparison
Creation Viewer Viewer Port-
Richness
Cost Req. Control ability
Photo
Low Low High Low Low
browsing
Slideshow Medium Low Low Medium Low
Illustrated
High High High High High
Text
Comic High Low High High High
How to lower it?
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 43
40. Comic Making – Cartoonist’s Way
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 44
43. Pomics = Picture to Comics
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 47
47
44. Computer-Aided Storytelling
Location
Timing Analysis
Aesthetics Analysis Picture Automated
Semantics Analysis
Auto Draft User
Storytelling Story Editing
Machine Learning
Own rating User
Popularity Preference Adjustment Final Story
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 48
59. Picture Semantics
Love / Like / Dear
Happy
Sleepy / sleeping
Tears
Wearing a hat
NO!
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 65
60. Can Pomics Do Micro-tasks?
The answer is YES!
Users were asked to create comics using a specific
album
Rewarded by 200 MB quota if their books are “shared”
by 20+ FB users
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 66
61. Picture Aesthetics from
Microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 67
62. Picture Saliency from Microtasks
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 68
63. Crowdmining Services
Advantages
No or little hiring cost once right incentives are given
Easily scale up
Can change the game rules to fit to research
Disadvantages
High development cost
Less flexible
Hard to find the right incentives (besides money)
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 69
64. Conclusion
Crowdmining is a potential and exciting area
Crowdsourcing != Mechanical Turking
A lot more can be done with crowdmining
services
Building your own
crowdmining service
today!
65. CrowdMM 2012
(in conjunction with ACM Multimedia 2012)
Keynote: Prof. Masataka Goto
(AIST, Japan)
11 oral+poster presentations
Annotation, Evaluation, Novel applications
An industrial panel discussion
Welcome to join us! http://crowdmm.org/
CrowdKDD’12: Crowdsourcing beyond Mechanical Turk / Kuan-Ta Chen 71
66. Unleash the power of
Crowd!
Thank You!
Kuan-Ta Chen
Academia Sinica
http://www.iis.sinica.edu.tw/~swc