The document proposes a method to enhance existing how-to videos by crowdsourcing step-by-step annotations. It presents a multi-stage crowdsourcing workflow to extract timing, labels, and before/after images for steps. An evaluation of the method on 75 videos across domains found 80% accuracy compared to expert annotations. A preliminary user study also found that a step-by-step video player improved task performance, self-efficacy, and design quality compared to a regular player.
CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
1. Juho Kim Phu Nguyen
Sarah Weir Philip J. Guo
Robert C. Miller Krzysztof Z. Gajos
Crowdsourcing Step-by-Step
Information Extraction to
Enhance Existing How-to Videos
10. Completeness & detail of step-by-step instructions are
integral to task performance.
Eiriksdottir and Catrambone, 2011
Proactive & random access, semantic indices in
instructional videos: better task performance and learner
satisfaction
Zhang et al., 2006
Interactivity can help overcome the difficulties of
perception and comprehension. Stopping, starting and
replaying an animation can allow reinspection.
Tversky et al., 2002
16. Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-to
video with step-by-step information?
Crowdsourcing annotation workflow
17. Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-to
video with step-by-step information?
Crowdsourcing annotation workflow
20. With ToolScape, learners will…
H1. feel more confident about their design
skills.
- self-efficacy gain
H2. believe they produced better designs.
- self-rating on designs produced
H3. actually produce better designs.
- external rating on designs produced
21. H1. Higher self-efficacy gain with ToolScape
– Four 7-Likert scale questions
– Mann-Whitney’s U test (Z=2.06, p<0.05), error bar: standard error
1.4
0
1
2
3
4
5
6
7
ToolScape
Baseline
0.1
3.8
3.8
22. H2. Higher self-rating with ToolScape
– One 7-Likert scale question
– Mann-Whitney’s U test (Z=2.70, p<0.01), error bar: standard error
5.3
3.5
0
1
2
3
4
5
6
7
ToolScape
Baseline
24. Non-sequentially navigating video
Step-level navigation: clicked 8.9 times per task
“It is great for skipping straight to relevant
portions of the tutorial.”
“It was also easier to go back to parts I missed.”
25. Research Questions
Does step-by-step navigation help learners?
Preliminary user study
How can we annotate an existing how-to
video with step-by-step information?
Crowdsourcing annotation workflow
41. Quality control for Stage 3
• Majority voting
• Breaking ties:
– Pixel diff to combine
“similar enough” frames
– Choose what’s closer to the step
42. Evaluation
• Generalizable?
75 Photoshop / Cooking / Makeup videos
• Accurate?
precision and recall
against trained annotators’ labels
43. Across all domains,
~80% precision and recall
Domain
Precision
Recall
Cooking
0.77
0.84
Makeup
0.74
0.77
Photoshop
0.79
0.79
All
0.77
0.81
44. Conceptual Level Differences
• “Now apply the bronzer to your face
evenly”
• “Apply the bronzer to the forehead”
• “Apply the bronzer to the cheekbones”
• “Apply the bronzer to the jawline”
45. Timing is 2.7 seconds off on average
Ground truth: one step every 17.3 seconds
2.7 seconds
46. Cost: $1.07 per minute of video
• 111 HITs / video (3 workers / task)
• $2.50 / video (Find + Verify)
• $4.85 / video (Find + Verify + Expand)
• $0.32 / step (time + label + before/after)
48. hierarchical solution structure extraction
Catrambone, R. The subgoal learning model: Creating better examples so that
students can solve novel problems. Journal of Experimental Psychology: General, 127, (1998).
Ongoing Work: Beyond low-level steps
49. hierarchical solution structure extraction
Ongoing Work: Beyond low-level steps
Learnersourcing: learners as a crowd
• Motivated, qualified
• Feedback loop between learners & system
50. Future of How-to Video Learning
What if we had 1000s of
fully annotated videos?
• Flexible learning paths with multiple videos
• Step-level search, recommendation
• Patterns from multiple solutions
51. Crowdsourcing Step-by-Step Information Extraction to
Enhance Existing How-to Videos
Juho Kim
MIT CSAIL
juhokim@mit.edu
juhokim.com
Acknowledgement: This work was supported in part by
Quanta Computer & the Samsung Fellowship.