Crowdsourcing metadata for audiovisual collections: from free tekst tags to semantic concepts
7 December 2011 | DISH | Rotterdam
Session: http://www.dish2011.nl/sessions/new-models-of-interaction-glams-linked-open-data-and-user-participation
Handwritten Text Recognition for manuscripts and early printed texts
Crowdsourcing metadata for audiovisual collections
1. Crowdsourcing metadata
for audiovisual collections
from free tekst tags to semantic concepts
Lotte Belice Baltussen – Sound and Vision
7 December 2011 | DISH
2.
3. Waisda? What’s that?
Allows people to annotate
audiovisual archive material in
the form of a game.
4. Added value
• Time-related metadata
• Social tagging (bridging the semantic gap)
• Interaction between the archive /broadcaster and the
public
• Gathering data for further research
• Efficiency?
annotating video takes up to 5 x the length of the video
• New business model?
4
5. Project partners pilot
• Netherlands Institute for Sound and Vision
(project management, content, research)
• KRO (concept, content, PR)
• VU (research within PrestoPRIME)
• Q42 (developer)
6. Man bijt hond Woordentikkertje
After evaluation:
• Improved interface
• New scoring mechanisms (semantics)
• New content
• More feedback
7.
8.
9. How does it work?
Players choose from
‘channels’ with different
episodes
10. How does it work?
Scoring:
Scoring as filter • Basic rule – players score
points when their tag exactly
matches the tag entered by
another player within 10
seconds
• Multiple other scoring
mechanisms to create various
tag incentives
12. Generating a constant flow of traffic is a challenge!
Important: Partners, publicity on external websites with
relevant communities and a large number of visitors.
Example FWAW, in one week:
• Triple # of tags to 160.000
• Double # of registered
players to 362
13. Outcomes
• Stats
• 340,551 tags added to 604 items, 42,068 unique tags
• 39.134 pageviews, 555 registered players, 10,926 visits
• Average playing time 6min45, 4.287 sessions
• Matches in Waisda? • Matches GTAA / Cornetto
15. Evaluation
av-documentalist
• Tags mostly describe short fragments and are often not very
specific. They don’t describe a programme as a whole.
• BUT! Can be solved by filtering and mapping free tekst tags
to existing vocabularies.
• The WNW tags were the most useful and specifc; content
influences specificity.
• Tags can be used in different ways and the relevance varies
per user group.
• Documentalists exicted about further development!
20. Waisda? Woordentikkertje
Months 8 4,5
Videos 648 2,892
Players 2,435 689
Tags – total 428,832 392,860
Tags – unique 48,242 (11%) 43,407 (11%)
Matches
• Players • 156,546 (37%) • 215,156 (55%)
• Geo. names* • 6,089 (1,4%) • 23,142 (5,8%)
• Persons* • 107 (0,25%) • 2,423 (0,6%)
* For Waisda? we looked at unique tags, for Woordentikkertje at the total number of tags
21. Tips and lessons learned so far
• What are your success criteria?
• How do you define your target users, and
how do you reach them?
• How do you motivate your target users?
• Read existing reports and literature!
• Keep learning and improving!
25. Future work
• Open Source version of Waisda?
• Crowdsourcing Olympics
• More research into the added value of tags for
retrieval (subtitle comparison, tests with
various end users, more research on linking
semantically rich sources to tags)
26. ...recommended sources
blogs, feeds, people
• http://museumtwo.blogspot.com/
• http://80gb.wordpress.com/
• http://themuseumofthefuture.com/
• http://www.delicious.com/RuncocoProject/
• @ammeveleigh
• @archivesopen
• @digitalst
• @microtask
• @mia_out
• @museweb
• @runcoco
• @wittylama
This presentation is partly based on Oomen & Aroyo 2011:
http://www.slideshare.net/PaulaUdondek/crowdsourcing-in-het-cultureel-erfgoed-kansen-uitdagingen
27. Thanks!
@lottebelice / lbbaltussen@beeldengeluid.nl
Big thank you to:
B&G: @johanoomen / @mbrinkerink
VU: @laroyo / @McHildebrand
http://blog.waisda.nl
http://woordentikkertje.manbijthond.nl