This document outlines the content of a Digital Humanities course covering crowdsourcing. The course aims to answer why projects rely on crowdsourcing and why people participate in crowdsourced projects. It discusses cases like using Mechanical Turk for transcription and crowdfunding scientific projects. Reasons projects use crowdsourcing include low costs and better user engagement. People participate for intrinsic motivations like contributing to open data maps or the challenge of climbing the ranks on Wikipedia. The document explores examples like OpenStreetMap, Wikipedia editing as a game, and Quora's gamified approach to soliciting expert answers.
DH101 2013/2014 course 9 - Crowdsourcing, crowdfunding, Wikipedia, Open Street Map, Mechanical Turk
1. Digital Humanities 101 - 2013/2014 - Course 9
Digital Humanities Laboratory
Fr´d´ric Kaplan
e e
frederic.kaplan@epfl.ch
2. o
Semester 1 : Content of each course
• (1) 19.09 Introduction to the course / Live Tweeting and Collective note
taking
• (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment
• (3) 2.10 Introduction to the Venice Time Machine project / Zotero
• 9.10 No course
• (4) 16.10 Digitization techniques / Deadline first assignment
• (5) 23.10 Datafication / Presentation of projects
• (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first
assignment
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
2
3. o
Semester 1 : Content of each course
• (7) 6.11 Pattern recognition / OCR / Semantic disambiguation
• (8) 13.11 Historical Geographic Information Systems, Procedural modeling /
City Engine / Deadline Project selection
• (9) 20.11 Crowdsourcing / Gamefication / Wikipedia
• (10) 27.11 Cultural heritage interfaces and visualisation / Museographic
experiences
• 4.12 Group work on the projects
• 11.12 Oral exam / Presentation of projects / Deadline Project blog
• 18.12 Oral exam / Presentation of projects
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
3
4. o
Today's course
• Objective of the course : Answering two questions : Why do projects rely on
crowdsourcing ? Why do people participate in crowdsourced projects ?
• Why do projects rely on crowdsourcing ?
• Case study : Transcribing handwritten texts using mechanical turk
• Case study : Crowdfunding a scientific project
• Why do people participate in crowdsourced projects ?
• Case study : Climbing the Wikipedia pyramid
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
4
6. o
From Wikipedia
• ”Crowdsourcing is the practice of
obtaining needed services, ideas, or
content by soliciting contributions from
a large group of people, and especially
from an online community, rather than
from traditional employees or suppliers”
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
6
7. o
From Wikipedia
• ’The term was coined in 2006 by Jeff
Howe in a Wired article, The Rise of
Crowdsourcing. http://www.wired.
com/wired/archive/14.06/crowds.
html?pg=1&topic=crowds
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
7
8. o
Why do projects rely on crowdsourcing ? Why do people
participate in crowdsourced projects ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
8
9. o
Why do projects rely on crowdsourcing ?
• Because its free or cheap (cf. Amazon’s Mechanical Turk)
• Because it permits to have a better engagement of users (or leaners in the
case of peer-grading)
• Because it permits to harness the wisdom of the crowds
• cf. Claire Ross, Social media for digital humanities and community engagement, in Warwick,
Terras, Nyhan, Digital Humanities in Practice, Facet Publishing, 2012.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
9
10. o
The wisdom of the crowds
• Surowiecki’s four criterias (2004)
• Diversity : Each participant has different
background and perspectives
• Independence : Each participant makes
their own decision
• Decentralization : Descision are local, no
central planner
• Aggregation : A way to turn individual
judgements into collective decisions.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
10
12. o
A case study : crowdsourced transcription
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
12
13. o
UCL Transcribe Betham
• 60 000 manuscripts of Jeremy Bentham
(1748-1832)
• 20 000 already transcribed using
traditoinal approach, 40 000 to go
• TEI Encoding. Use MediaWiki
• 5 000 manuscripts transcribed (06-2013)
• 33 000 volunteers but a very limited
number of very productive and dedicated
users
• Crowdsifting instead of crowdsourcing
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
13
14. o
ReCaptcha : A free anti-bot service
• From http://www.google.com/
recaptcha/learnmore
• 200+ million CAPTCHAs are solved by
humans around the world every day.
• 10 s / CAPTCHA
• 150 000 hours of work each day
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
14
15. o
ReCaptcha : A free anti-bot service
• reCAPTCHA improves the process of
digitizing books by sending words that
cannot be read by computers to the
Web in the form of CAPTCHAs for
humans to decipher.
• But if a computer can’t read such a
CAPTCHA, how does the system know
the correct answer to the puzzle ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
15
16. o
ReCaptcha : A free anti-bot service
• Each new word that cannot be read
correctly by OCR is given to a user in
conjunction with another word for which
the answer is already known.
• If they solve the one for which the
answer is known, the system assumes
their answer is correct for the new one.
The system then gives the new image to
a number of other people to determine,
with higher confidence, whether the
original answer was correct.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
16
17. o
Can we use Mechanical Turk to do this ?
• Who knows where the name Mechanical Turk comes from ?
• Mechanical Turk permits to perform Human Intelligence Tasks (HITs)
• A requester is presented with many different templates from which to choose
in the design of a HIT which include a writing, survey, translation,
categorization, and other templates.
• 500 000 workers from over 190 countries in January 2011.
• Payments are done with Amazon Payments. Requesters pay 10 % of the price
of successfully completed HITs to Amazon
• The average wage is about one dollar an hour (each task averaging a few
cents). Some have criticized Mechanical Turk as a digital sweatshop. We will
discuss this more at the end of this lecture.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
17
18. o
Crowdflower : a meta-engine for crowdsourcing
• Crowdflower plays the role of meta-engine or interface to several
crowdsourcing services.
• CrowdFlower has over 50 labor channel partners, among them Amazon
Mechanical Turk
• 1 billion tasks (small units of work) since it began operation, and presently
does 5 man-years of work daily (Source : Wikipedia 19/11/2013)
• So let’s try it.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
18
29. o
Combining crowdsourcing and grammatical rules
• Raw crowdsourced words transcriptions are likely to contain many errors
• But we also have a good grammatical model of this venetian dialect (Thanks
to the work of Lorenzo Tomasin) and a lot of venetian transcriptions.
• Many errors could be automatically corrected using these bits of information.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
29
30. o
Survey : Do you want to use crowdsourcing in your next
semester’s project ? Should the DHLAB sponsor this ?
Answer on Framapad
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
30
31. o
What about crowdfunding your research project ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
31
33. o
Crowdfunding in general
• Kickstarter : 5.2 million people have pledged 882 million, funding 52 000
projects.
• Kiva : 600 000+ lenders have channelled almost 275 million to entrepreneurs
in the developing world.
• Obama’s 2008 election campaign : 780 million, much of it from small online
donations.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
33
34. o
Example of a scientific Kickstarter project
http://www.kickstarter.com/projects/1616707907/virtual-prehistoric-worlds
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
34
38. o
Crowdfunding sites
• Indiegogo : http://www.indiegogo.com/
• France : http://www.ulule.com/
• Switzerland : http://wemakeit.ch/
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
38
39. o
After the pause, we will talk about Wikipedia and
Gamification. In the meantime you can try Wikirace
http://wikirace.christopherdebeer.com/
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
39
40. o
Why do people participate in crowdsourcing projects ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
40
48. o
Is Wikipedia a good resource ?
• Some academics argue that the use of Wikipedia is not appropriate for
scholarly settings, because it is collectively built by amateurs.
• Achterman, D. (2005) Surviving Wikipedia : improving student search habits through information
literacy and teacher collaboration, Knowdelge Quest, 33 (5), 38-40
• Black, E. (2007) Wikipedia and Academic Peer Review : Wikipedia as a recognized medium for
scholarly publications ? Online Information Review, 32 (1), 73-88
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
48
49. o
Wikipedia is in perpetual beta, constantly getting better
• Wikipedia is be updated and improved at a much faster rythm that other
scholarly edited encyclopedias.
• It improves all the time.
• Several recent studies have shown that Wikipedia can equal or outperform
other traditionally edited encyclopedias in terms of accuracy.
• Giles, J. (2005), Internet Encyclopedia go Head to Head, Nature, 438, 900-1
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
49
50. o
Wikipedia creates a diplomatic zone
• Wikipedia manages to create a diplomatic zone, where conflicts between
different perspectives can be solved in search of a common neutral consensus.
This is a definitive advantage compared to other static (online or printed)
encyclopedias.
• For diplomacy in general, see Bruno Latour, Enquetes sur les modes d’existence : Une
anthropologie des modernes, La Decouverte, 2012.
• Bryant, S. et al (2005) Becoming Wikpedian, In Group 05, 1-10, ACM Press
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
50
51. o
Wikipedia is felt as common good
• It is backed-up by many users all over the world
• Therefore, it is one of the rare digital resources that is bound to last.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
51
54. o
Foursquare is game and a mapping service
• In recent years, we have seen several
examples of successful creation of
collective knowledge bases using
addictive games.
• This is a particular case of Gamification
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
54
55. o
Twitter is a game
• One could argue that services for
sharing/constructing collective
knowledge online are also games (even if
they are not presented as such).
• The success of Twitter is linked with its
smooth Onboarding process
• We discussed this case on course 1.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
55
56. o
Quora is a game
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
56
58. o
Quora's strategy
• Quora must attract qualified contributors to write high quality answers to
questions.
• Can you imagine some strategy to reach this goal ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
58
59. o
To reach this goal, Quora chose a very clear strategy :
personalize the answers, anonymize the questions
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
59
61. o
Quora's strategy
• Questions are not owned by the person who asks them.
• They are immediatly treated as a common goods, that can be updated and
modified by anyone.
• On the contrary, the interface associates strongly the user and his answers.
• The systematic juxtaposition between the id of the user (incl. pictures, name
and short bio) and his answers introduces an equivalence between the value of
an user and the value of his answers.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
61
62. o
Quora's strategy
• In addition, Quora introduces an explicit ranking system : the best rated
answers are shown first.
• Each question is thus a competition between Quora’s users.
• The one who provides the best answer wins the game.
• Like in Twitter, the user understands Quora’s implicit rules as he plays and
learns what he must do to play well in this particular kind of games.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
62
63. o
What kind of game is Wikipedia ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
63
64. o
Wikipedia is MMORPG (Massively Multiplayer Online Role
Playing Games)
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
64
65. o
Wikipedia
• Onboarding : No need to be identified to start contributing. But this is
necessary to climb the tiers.
• Registering is like reaching level 1
• By registering, the user gets a few new powers. He can have his own webpage.
He can vote.
• These are first steps to motivate him to progressively discover and climb the
levels of the big pyramid associated with each version of Wikipedia.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
65
66. o
Wikipedia
• How can one climb the tiers ? What kind of privilege have the more powerful
users ? The new contributor does not know it yet.
• If he persists he will discover that he can exercice different jobs in the
Wikipedia world.
• Administrators, Bureaucrats, Stewards, Mediators, Judge, Bot creator,
Importator, Oversighter, IP Checkers.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
66
67. o
Administrators
• Administrators are responsible for cleaning particular pages, checking
copyright issues, repair vandalism acts.
• All this tasks can be done by a normal user, but an administrator has access
to special powers
• erase non relevant pages
• protect some pages against change
• block certain users
• rename pages
• mask the history of particular pages.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
67
68. o
Administrators
• How does one become an administrator ? He needs to be elected.
• The following criteria are recommended :
• a very good understanding of the wiki syntax, rules and global functioning of the local version of
Wikipedia.
• participation to maintenance works
• around 3000 participations
• at least one year of significant activity
• The election is set on a given day and the candidate must obtain a clear
majority (this notion is not absolutely well defined in the French version of
Wikipedia)
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
68
69. o
IP Checker
• An IP Checker has access to the check-user function that permits to make
explicit the connection between an user IP and his account. To become an IP
Checker, one must be approved by the arbitration committee.
• Only 5 persons have this privilege on the French version of Wikipedia.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
69
70. o
Oversighter
• Oversighter can mask a username from all the public records
• mask a comment
• mask a version of a page
• suppress a page and mask it even to administrator
• see oversighter’s special records
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
70
71. o
Bots creators
• Among the 30 most active editors on Wikipedia, 2/3 are bots
• Bots perform repetitive tasks and can interact on Wikipedia pages like a real
Wikipedia user (generate article, edit or destroy an article, translate part of an
article, solve homonymy issues, correct vandalism acts)
• Only a bureaucrat or a steward can allow someone to be a bot creator.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
71
72. o
Bureaucrats
• Bureaucrats manage the status of other users (administrators, bots,
bureaucrats).
• Only 8 persons have this privilege on the French version of Wikipedia.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
72
73. o
Stewards are super bureaucrats
• Stewards are appointed by the international comity. They can manage the
status of all the others contributors.
• There are only 3 stewards on the French version of Wikipedia.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
73
74. o
Mediators
• They can intervene during the fights but cannot vote or recommend a
punitive action.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
74
75. o
Judge
• They can impose a punitive action
• The ArbCom (Arbitration Committee) of the English version of Wikipedia has
only 15 members.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
75
76. o
Wikipedia has also its foundational stories
• The Essjay’s controversy : Essjay was an eminent member of the Wikicratia,
cumulating the functions of administrators, bureaucrats, judge and mediators.
He was caught lying on his bio in this Wikipedia personal page and was
banned.
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
76
77. o
World of Warcraft is so boring compared to Wikipedia
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
77
78. o
World of Warcraft is so boring compared to Wikipedia
• Ordinary clercks during the day, Wikipedian during the night.
• On Wikipedia, with time and perseverance each player can have a double life,
masked behind his pseudo. He can earn new powers as hardly obtained as one
of a big magician in role playing heroic fantasy games.
• When I wrote a first blog post on this issue, a French Wikipedia Bureaucrat
pointed to me a relatively well hidden page describing Wikipedia as
MMORPG. http://fr.wikipedia.org/wiki/Wikipedia:MMORPG
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
78
79. o
Open Debate : is crowdsourcing and gamificiation
ethical ?
Digital Humanities 101 - 2013/2014 - Course 9 | 2013
79