SlideShare une entreprise Scribd logo
1  sur  20
Crowdsourcing Transcription
Beyond Mechanical Turk
Haofeng Zhou,

Denys Baskov, Matthew Lease

Matthew Lease
School of Information
University of Texas at Austin

@mattlease

ml@utexas.edu
Roadmap
• Natural Speech: Opportunity & Challenge
• Strengths & Limitations of AMT research
– e.g. AMT-based Transcription

• Qualitative review of 8 transcription providers
• Quantitative evaluation of 4 providers
• Observations & Contributions
2
The Rise of Stored Natural Speech
• Conversational speech is the most ubiquitous
form of human communication on the planet

• We can now capture & store our
conversations in new ways & at massive scale
• But… need effective technology to search
massive conversational speech archives
• Oard: “Unlocking the Potential of the Spoken Word”

3
Oral History as a Testbed

4
oh i'll you know are yeah yeah yeah yeah yeah yeah yeah
the very why don't we start with you saying anything in your
about grandparents great grandparents well as a small
child i remember only one of my grandfathers and his wife
his second wife he was selling flour and the type of
business it was he didn't even have a store he just a few
sacks of different flour and the entrance of an apartment
building and people would pass by everyday and buy a
chela but two killers of flour we have to remember related
times were there was no already baked bread so people
had to baked her own bread all the time for some strange
reason i do remember fresh rolls where everyone would
buy every day but not the bread so that was the business
that's how he made a living where was this was the name
of the town it wasn't shammay dish he ours is we be and
why i as i know in southern poland and alisa are close
5
Perfect ASR: Raw Transcription
I never left new York before I didn't know anything else
so some fellow I knew he said I have a friend that lives
in Tucson Arizona so I went to the map looked it up I
never heard of Tucson he says I'll write him a letter and
when you go there you could stay with him so he did he
wrote a letter and his friend he was a dentist he invited
me to come over there and spend a week with him

6
Rich Transcription
[so I didn't] * I never left New York before.
I didn't know anything else.
So some fellow I knew [mentioned that] <uh> * he said I have
a friend that lives [in Arizona] * in Tucson Arizona.
So I went to the map looked it up.
<um> I never heard of Tucson.
<uh and anyhow> He says <well> I'll write him a letter and
when you go there you could <uh> stay with him.
So he did.
He wrote a letter.
And his friend, he was a dentist.
He invited me to come over there and spend a week with him.
7
Transcription Research via AMT
•
•
•
•
•
•
•
•

Audhkhasi et al. (2011)
Evanini et al. (2010)
Gruenstein et al. (2009)
Lee et al. (2011)
Marge et al. (2010)
Novotney et al. (2010)
Parent et al. (2010)
Williams et al. (2011)
8
9
Why Eytan Adar hates MTurk
Research (CHI 2011 CHC Workshop)
• Overly-narrow focus on MTurk
– Identify general vs. platform-specific problems
– Academic vs. Industrial problems

• How much should we focus on “...writing
the user’s manual for MTurk ... struggl[ing]
against the limits of the platform...”?
10
HCOMP 2013 Panel
Anand Kulkarni: “How do we
dramatically reduce the complexity of
getting work done with the crowd?”

Greg Little: How can we post a task
and with 98% confidence know we’ll
get a quality result?
11
Beyond AMT: An Analysis of
Crowd Work Platforms
• Vakharia & Lease, arXiv online 2013
• Near-exclusive research focus on AMT
risks its particular vagaries and limitations
overly shaping our understanding of crowd
work and the research questions and
directions being pursued.
• We present a cross-platform content
analysis of seven crowd work platforms.
12
Transcription Providers

13
Qualitative Analysis
• Base Price
• Accuracy
• Transcript Formats
• Time stamps
• Speaker Identification/Changes
• Verbatim
• Turnaround Time
• Difficult Audio Surcharge
14
Experiment
• 10-minute segments from 6 interviews
– USC-SFI MALACH English corpus (LDC2012S05)

• 4 low-cost service providers
–
–
–
–

CastingWords (CW)
Transcription Hub (TH)
1-888-Type-It-Up (VerbalFusion, VF)
oDesk: 3 workers

• Format Issues & Data Cleaning
• Aligned with revised CMU Sphinx code
15
Word Error Rate (WER) vs. Cost
Service Provider with
Price Rate

Interviews Transcripts
00017

00038

00042

00058

00740

13078

Avg. by
Provider

Accuracy/
$ Ratio

CastingWords (CW)
($60/hr per audio)

31.356
9.707
0.154

33.198
17.005
0.881

23.273
14.885
0.822

28.624
15.976
0.814

16.833
11.643
1.996

26.452
14.129
2.119

26.623
13.891
1.131

1.435

Transcription Hub (TH)
($45/hr per audio)

30.233
8.450
0.155

34.628
18.405
1.022

29.129
18.308
1.221

33.433
18.399
1.197

18.071
9.036
2.495

28.874
14.588
2.116

29.061
14.531
1.368

1.899

1-888-Type-It-Up (VF)
(avg $125/hr per audio)

28.874
9.524
0.151

26.819
11.051
1.011

18.543
11.175
0.662

23.921
11.658
0.454

12.559
6.212
2.296

24.072
10.977
2.120

22.465
10.099
1.116

0.719

oDesk Worker1 (OD1)
($5.56/hr per work)

31.144
10.510
0.155

29.787
16.884
1.098

30.465
13.697
0.626

15.522

24.281
13.600
2.594

7.777

36.199
20.886
1.678

5.696

oDesk Worker2 (OD2)
($11.11/hr per work)

20.066
12.226
2.591

oDesk Worker3 (OD3)
($13.89/hr per work)
Avg. by Interview

34.415
22.545
1.623
30.402
9.548
0.154

31.108
15.836
1.003

37.983
19.228
1.734

26.340
16.728
1.082

30.990
16.315
1.050

28.495
14.973
2.597

16.883
9.779
2.345

26.973
13.667
2.238

28.183
14.451
1.419

16
Errors Distribution in WER
3000

Misc
Name
Alignment
PostError
Spelling
Revision
Repetition
Filler
RefError
Partial
Background

2500

Errors

2000

1500

1000

500

0
CW

OD

TH

VF

17
Hidden Costs
• Management costs beyond Base Price
– Crowdsourcing studies rarely discuss other
costs (other costs dwarf crowd costs…)

• CW, TH and VF's price higher than oDesk
• But… oDesk: no management cost in the
price rate, but additional effort was needed
– communicate with workers to negotiate price
– clarify requirements, and monitor work
– take risk of low quality or late/no delivery
18
Contributions
• Snapshot in time of current crowdsourcing
transcription providers & offerings beyond AMT
– Those looking for alternatives today
– Retrospective studies

• Quantitative WER vs. cost for spontaneous
speech transcription across providers
• Discussion of tradeoffs among quality, cost,
risk & effort in crowdsourcing transcription
19
Thank You!

Matt Lease
ml@utexas.edu
Slides: www.slideshare.net/mattlease

ir.ischool.utexas.edu

Contenu connexe

En vedette

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
Matthew Lease
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
North Bridge
 

En vedette (8)

The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
Cohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors MeetingCohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors Meeting
 
Núñez Nayeli C4
Núñez Nayeli C4Núñez Nayeli C4
Núñez Nayeli C4
 
Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.
 
Multi-supplier governance
Multi-supplier governance Multi-supplier governance
Multi-supplier governance
 
Telecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM FlowsTelecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM Flows
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
 

Plus de Matthew Lease

Plus de Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Crowdsourcing Transcription Beyond Mechanical Turk

  • 1. Crowdsourcing Transcription Beyond Mechanical Turk Haofeng Zhou, Denys Baskov, Matthew Lease Matthew Lease School of Information University of Texas at Austin @mattlease ml@utexas.edu
  • 2. Roadmap • Natural Speech: Opportunity & Challenge • Strengths & Limitations of AMT research – e.g. AMT-based Transcription • Qualitative review of 8 transcription providers • Quantitative evaluation of 4 providers • Observations & Contributions 2
  • 3. The Rise of Stored Natural Speech • Conversational speech is the most ubiquitous form of human communication on the planet • We can now capture & store our conversations in new ways & at massive scale • But… need effective technology to search massive conversational speech archives • Oard: “Unlocking the Potential of the Spoken Word” 3
  • 4. Oral History as a Testbed 4
  • 5. oh i'll you know are yeah yeah yeah yeah yeah yeah yeah the very why don't we start with you saying anything in your about grandparents great grandparents well as a small child i remember only one of my grandfathers and his wife his second wife he was selling flour and the type of business it was he didn't even have a store he just a few sacks of different flour and the entrance of an apartment building and people would pass by everyday and buy a chela but two killers of flour we have to remember related times were there was no already baked bread so people had to baked her own bread all the time for some strange reason i do remember fresh rolls where everyone would buy every day but not the bread so that was the business that's how he made a living where was this was the name of the town it wasn't shammay dish he ours is we be and why i as i know in southern poland and alisa are close 5
  • 6. Perfect ASR: Raw Transcription I never left new York before I didn't know anything else so some fellow I knew he said I have a friend that lives in Tucson Arizona so I went to the map looked it up I never heard of Tucson he says I'll write him a letter and when you go there you could stay with him so he did he wrote a letter and his friend he was a dentist he invited me to come over there and spend a week with him 6
  • 7. Rich Transcription [so I didn't] * I never left New York before. I didn't know anything else. So some fellow I knew [mentioned that] <uh> * he said I have a friend that lives [in Arizona] * in Tucson Arizona. So I went to the map looked it up. <um> I never heard of Tucson. <uh and anyhow> He says <well> I'll write him a letter and when you go there you could <uh> stay with him. So he did. He wrote a letter. And his friend, he was a dentist. He invited me to come over there and spend a week with him. 7
  • 8. Transcription Research via AMT • • • • • • • • Audhkhasi et al. (2011) Evanini et al. (2010) Gruenstein et al. (2009) Lee et al. (2011) Marge et al. (2010) Novotney et al. (2010) Parent et al. (2010) Williams et al. (2011) 8
  • 9. 9
  • 10. Why Eytan Adar hates MTurk Research (CHI 2011 CHC Workshop) • Overly-narrow focus on MTurk – Identify general vs. platform-specific problems – Academic vs. Industrial problems • How much should we focus on “...writing the user’s manual for MTurk ... struggl[ing] against the limits of the platform...”? 10
  • 11. HCOMP 2013 Panel Anand Kulkarni: “How do we dramatically reduce the complexity of getting work done with the crowd?” Greg Little: How can we post a task and with 98% confidence know we’ll get a quality result? 11
  • 12. Beyond AMT: An Analysis of Crowd Work Platforms • Vakharia & Lease, arXiv online 2013 • Near-exclusive research focus on AMT risks its particular vagaries and limitations overly shaping our understanding of crowd work and the research questions and directions being pursued. • We present a cross-platform content analysis of seven crowd work platforms. 12
  • 14. Qualitative Analysis • Base Price • Accuracy • Transcript Formats • Time stamps • Speaker Identification/Changes • Verbatim • Turnaround Time • Difficult Audio Surcharge 14
  • 15. Experiment • 10-minute segments from 6 interviews – USC-SFI MALACH English corpus (LDC2012S05) • 4 low-cost service providers – – – – CastingWords (CW) Transcription Hub (TH) 1-888-Type-It-Up (VerbalFusion, VF) oDesk: 3 workers • Format Issues & Data Cleaning • Aligned with revised CMU Sphinx code 15
  • 16. Word Error Rate (WER) vs. Cost Service Provider with Price Rate Interviews Transcripts 00017 00038 00042 00058 00740 13078 Avg. by Provider Accuracy/ $ Ratio CastingWords (CW) ($60/hr per audio) 31.356 9.707 0.154 33.198 17.005 0.881 23.273 14.885 0.822 28.624 15.976 0.814 16.833 11.643 1.996 26.452 14.129 2.119 26.623 13.891 1.131 1.435 Transcription Hub (TH) ($45/hr per audio) 30.233 8.450 0.155 34.628 18.405 1.022 29.129 18.308 1.221 33.433 18.399 1.197 18.071 9.036 2.495 28.874 14.588 2.116 29.061 14.531 1.368 1.899 1-888-Type-It-Up (VF) (avg $125/hr per audio) 28.874 9.524 0.151 26.819 11.051 1.011 18.543 11.175 0.662 23.921 11.658 0.454 12.559 6.212 2.296 24.072 10.977 2.120 22.465 10.099 1.116 0.719 oDesk Worker1 (OD1) ($5.56/hr per work) 31.144 10.510 0.155 29.787 16.884 1.098 30.465 13.697 0.626 15.522 24.281 13.600 2.594 7.777 36.199 20.886 1.678 5.696 oDesk Worker2 (OD2) ($11.11/hr per work) 20.066 12.226 2.591 oDesk Worker3 (OD3) ($13.89/hr per work) Avg. by Interview 34.415 22.545 1.623 30.402 9.548 0.154 31.108 15.836 1.003 37.983 19.228 1.734 26.340 16.728 1.082 30.990 16.315 1.050 28.495 14.973 2.597 16.883 9.779 2.345 26.973 13.667 2.238 28.183 14.451 1.419 16
  • 17. Errors Distribution in WER 3000 Misc Name Alignment PostError Spelling Revision Repetition Filler RefError Partial Background 2500 Errors 2000 1500 1000 500 0 CW OD TH VF 17
  • 18. Hidden Costs • Management costs beyond Base Price – Crowdsourcing studies rarely discuss other costs (other costs dwarf crowd costs…) • CW, TH and VF's price higher than oDesk • But… oDesk: no management cost in the price rate, but additional effort was needed – communicate with workers to negotiate price – clarify requirements, and monitor work – take risk of low quality or late/no delivery 18
  • 19. Contributions • Snapshot in time of current crowdsourcing transcription providers & offerings beyond AMT – Those looking for alternatives today – Retrospective studies • Quantitative WER vs. cost for spontaneous speech transcription across providers • Discussion of tradeoffs among quality, cost, risk & effort in crowdsourcing transcription 19
  • 20. Thank You! Matt Lease ml@utexas.edu Slides: www.slideshare.net/mattlease ir.ischool.utexas.edu