SlideShare une entreprise Scribd logo
1  sur  20
Crowdsourcing Transcription
Beyond Mechanical Turk
Haofeng Zhou,

Denys Baskov, Matthew Lease

Matthew Lease
School of Information
University of Texas at Austin

@mattlease

ml@utexas.edu
Roadmap
• Natural Speech: Opportunity & Challenge
• Strengths & Limitations of AMT research
– e.g. AMT-based Transcription

• Qualitative review of 8 transcription providers
• Quantitative evaluation of 4 providers
• Observations & Contributions
2
The Rise of Stored Natural Speech
• Conversational speech is the most ubiquitous
form of human communication on the planet

• We can now capture & store our
conversations in new ways & at massive scale
• But… need effective technology to search
massive conversational speech archives
• Oard: “Unlocking the Potential of the Spoken Word”

3
Oral History as a Testbed

4
oh i'll you know are yeah yeah yeah yeah yeah yeah yeah
the very why don't we start with you saying anything in your
about grandparents great grandparents well as a small
child i remember only one of my grandfathers and his wife
his second wife he was selling flour and the type of
business it was he didn't even have a store he just a few
sacks of different flour and the entrance of an apartment
building and people would pass by everyday and buy a
chela but two killers of flour we have to remember related
times were there was no already baked bread so people
had to baked her own bread all the time for some strange
reason i do remember fresh rolls where everyone would
buy every day but not the bread so that was the business
that's how he made a living where was this was the name
of the town it wasn't shammay dish he ours is we be and
why i as i know in southern poland and alisa are close
5
Perfect ASR: Raw Transcription
I never left new York before I didn't know anything else
so some fellow I knew he said I have a friend that lives
in Tucson Arizona so I went to the map looked it up I
never heard of Tucson he says I'll write him a letter and
when you go there you could stay with him so he did he
wrote a letter and his friend he was a dentist he invited
me to come over there and spend a week with him

6
Rich Transcription
[so I didn't] * I never left New York before.
I didn't know anything else.
So some fellow I knew [mentioned that] <uh> * he said I have
a friend that lives [in Arizona] * in Tucson Arizona.
So I went to the map looked it up.
<um> I never heard of Tucson.
<uh and anyhow> He says <well> I'll write him a letter and
when you go there you could <uh> stay with him.
So he did.
He wrote a letter.
And his friend, he was a dentist.
He invited me to come over there and spend a week with him.
7
Transcription Research via AMT
•
•
•
•
•
•
•
•

Audhkhasi et al. (2011)
Evanini et al. (2010)
Gruenstein et al. (2009)
Lee et al. (2011)
Marge et al. (2010)
Novotney et al. (2010)
Parent et al. (2010)
Williams et al. (2011)
8
9
Why Eytan Adar hates MTurk
Research (CHI 2011 CHC Workshop)
• Overly-narrow focus on MTurk
– Identify general vs. platform-specific problems
– Academic vs. Industrial problems

• How much should we focus on “...writing
the user’s manual for MTurk ... struggl[ing]
against the limits of the platform...”?
10
HCOMP 2013 Panel
Anand Kulkarni: “How do we
dramatically reduce the complexity of
getting work done with the crowd?”

Greg Little: How can we post a task
and with 98% confidence know we’ll
get a quality result?
11
Beyond AMT: An Analysis of
Crowd Work Platforms
• Vakharia & Lease, arXiv online 2013
• Near-exclusive research focus on AMT
risks its particular vagaries and limitations
overly shaping our understanding of crowd
work and the research questions and
directions being pursued.
• We present a cross-platform content
analysis of seven crowd work platforms.
12
Transcription Providers

13
Qualitative Analysis
• Base Price
• Accuracy
• Transcript Formats
• Time stamps
• Speaker Identification/Changes
• Verbatim
• Turnaround Time
• Difficult Audio Surcharge
14
Experiment
• 10-minute segments from 6 interviews
– USC-SFI MALACH English corpus (LDC2012S05)

• 4 low-cost service providers
–
–
–
–

CastingWords (CW)
Transcription Hub (TH)
1-888-Type-It-Up (VerbalFusion, VF)
oDesk: 3 workers

• Format Issues & Data Cleaning
• Aligned with revised CMU Sphinx code
15
Word Error Rate (WER) vs. Cost
Service Provider with
Price Rate

Interviews Transcripts
00017

00038

00042

00058

00740

13078

Avg. by
Provider

Accuracy/
$ Ratio

CastingWords (CW)
($60/hr per audio)

31.356
9.707
0.154

33.198
17.005
0.881

23.273
14.885
0.822

28.624
15.976
0.814

16.833
11.643
1.996

26.452
14.129
2.119

26.623
13.891
1.131

1.435

Transcription Hub (TH)
($45/hr per audio)

30.233
8.450
0.155

34.628
18.405
1.022

29.129
18.308
1.221

33.433
18.399
1.197

18.071
9.036
2.495

28.874
14.588
2.116

29.061
14.531
1.368

1.899

1-888-Type-It-Up (VF)
(avg $125/hr per audio)

28.874
9.524
0.151

26.819
11.051
1.011

18.543
11.175
0.662

23.921
11.658
0.454

12.559
6.212
2.296

24.072
10.977
2.120

22.465
10.099
1.116

0.719

oDesk Worker1 (OD1)
($5.56/hr per work)

31.144
10.510
0.155

29.787
16.884
1.098

30.465
13.697
0.626

15.522

24.281
13.600
2.594

7.777

36.199
20.886
1.678

5.696

oDesk Worker2 (OD2)
($11.11/hr per work)

20.066
12.226
2.591

oDesk Worker3 (OD3)
($13.89/hr per work)
Avg. by Interview

34.415
22.545
1.623
30.402
9.548
0.154

31.108
15.836
1.003

37.983
19.228
1.734

26.340
16.728
1.082

30.990
16.315
1.050

28.495
14.973
2.597

16.883
9.779
2.345

26.973
13.667
2.238

28.183
14.451
1.419

16
Errors Distribution in WER
3000

Misc
Name
Alignment
PostError
Spelling
Revision
Repetition
Filler
RefError
Partial
Background

2500

Errors

2000

1500

1000

500

0
CW

OD

TH

VF

17
Hidden Costs
• Management costs beyond Base Price
– Crowdsourcing studies rarely discuss other
costs (other costs dwarf crowd costs…)

• CW, TH and VF's price higher than oDesk
• But… oDesk: no management cost in the
price rate, but additional effort was needed
– communicate with workers to negotiate price
– clarify requirements, and monitor work
– take risk of low quality or late/no delivery
18
Contributions
• Snapshot in time of current crowdsourcing
transcription providers & offerings beyond AMT
– Those looking for alternatives today
– Retrospective studies

• Quantitative WER vs. cost for spontaneous
speech transcription across providers
• Discussion of tradeoffs among quality, cost,
risk & effort in crowdsourcing transcription
19
Thank You!

Matt Lease
ml@utexas.edu
Slides: www.slideshare.net/mattlease

ir.ischool.utexas.edu

Contenu connexe

En vedette

The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
Cohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors MeetingCohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors MeetingBonner Foundation
 
Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.Nayeli Núñez
 
Multi-supplier governance
Multi-supplier governance Multi-supplier governance
Multi-supplier governance WGroup
 
Telecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM FlowsTelecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM FlowsRobert Bratulic
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing StudyNorth Bridge
 

En vedette (8)

The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
Cohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors MeetingCohort Learning Communities: 2016 Bonner New Directors Meeting
Cohort Learning Communities: 2016 Bonner New Directors Meeting
 
Núñez Nayeli C4
Núñez Nayeli C4Núñez Nayeli C4
Núñez Nayeli C4
 
Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.Caso Harvard 4: Graves Industries Inc.
Caso Harvard 4: Graves Industries Inc.
 
Multi-supplier governance
Multi-supplier governance Multi-supplier governance
Multi-supplier governance
 
Telecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM FlowsTelecommunication Business Process - eTOM Flows
Telecommunication Business Process - eTOM Flows
 
2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study2016 Future of Cloud Computing Study
2016 Future of Cloud Computing Study
 

Plus de Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 

Plus de Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 

Dernier

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Crowdsourcing Transcription Beyond Mechanical Turk

  • 1. Crowdsourcing Transcription Beyond Mechanical Turk Haofeng Zhou, Denys Baskov, Matthew Lease Matthew Lease School of Information University of Texas at Austin @mattlease ml@utexas.edu
  • 2. Roadmap • Natural Speech: Opportunity & Challenge • Strengths & Limitations of AMT research – e.g. AMT-based Transcription • Qualitative review of 8 transcription providers • Quantitative evaluation of 4 providers • Observations & Contributions 2
  • 3. The Rise of Stored Natural Speech • Conversational speech is the most ubiquitous form of human communication on the planet • We can now capture & store our conversations in new ways & at massive scale • But… need effective technology to search massive conversational speech archives • Oard: “Unlocking the Potential of the Spoken Word” 3
  • 4. Oral History as a Testbed 4
  • 5. oh i'll you know are yeah yeah yeah yeah yeah yeah yeah the very why don't we start with you saying anything in your about grandparents great grandparents well as a small child i remember only one of my grandfathers and his wife his second wife he was selling flour and the type of business it was he didn't even have a store he just a few sacks of different flour and the entrance of an apartment building and people would pass by everyday and buy a chela but two killers of flour we have to remember related times were there was no already baked bread so people had to baked her own bread all the time for some strange reason i do remember fresh rolls where everyone would buy every day but not the bread so that was the business that's how he made a living where was this was the name of the town it wasn't shammay dish he ours is we be and why i as i know in southern poland and alisa are close 5
  • 6. Perfect ASR: Raw Transcription I never left new York before I didn't know anything else so some fellow I knew he said I have a friend that lives in Tucson Arizona so I went to the map looked it up I never heard of Tucson he says I'll write him a letter and when you go there you could stay with him so he did he wrote a letter and his friend he was a dentist he invited me to come over there and spend a week with him 6
  • 7. Rich Transcription [so I didn't] * I never left New York before. I didn't know anything else. So some fellow I knew [mentioned that] <uh> * he said I have a friend that lives [in Arizona] * in Tucson Arizona. So I went to the map looked it up. <um> I never heard of Tucson. <uh and anyhow> He says <well> I'll write him a letter and when you go there you could <uh> stay with him. So he did. He wrote a letter. And his friend, he was a dentist. He invited me to come over there and spend a week with him. 7
  • 8. Transcription Research via AMT • • • • • • • • Audhkhasi et al. (2011) Evanini et al. (2010) Gruenstein et al. (2009) Lee et al. (2011) Marge et al. (2010) Novotney et al. (2010) Parent et al. (2010) Williams et al. (2011) 8
  • 9. 9
  • 10. Why Eytan Adar hates MTurk Research (CHI 2011 CHC Workshop) • Overly-narrow focus on MTurk – Identify general vs. platform-specific problems – Academic vs. Industrial problems • How much should we focus on “...writing the user’s manual for MTurk ... struggl[ing] against the limits of the platform...”? 10
  • 11. HCOMP 2013 Panel Anand Kulkarni: “How do we dramatically reduce the complexity of getting work done with the crowd?” Greg Little: How can we post a task and with 98% confidence know we’ll get a quality result? 11
  • 12. Beyond AMT: An Analysis of Crowd Work Platforms • Vakharia & Lease, arXiv online 2013 • Near-exclusive research focus on AMT risks its particular vagaries and limitations overly shaping our understanding of crowd work and the research questions and directions being pursued. • We present a cross-platform content analysis of seven crowd work platforms. 12
  • 14. Qualitative Analysis • Base Price • Accuracy • Transcript Formats • Time stamps • Speaker Identification/Changes • Verbatim • Turnaround Time • Difficult Audio Surcharge 14
  • 15. Experiment • 10-minute segments from 6 interviews – USC-SFI MALACH English corpus (LDC2012S05) • 4 low-cost service providers – – – – CastingWords (CW) Transcription Hub (TH) 1-888-Type-It-Up (VerbalFusion, VF) oDesk: 3 workers • Format Issues & Data Cleaning • Aligned with revised CMU Sphinx code 15
  • 16. Word Error Rate (WER) vs. Cost Service Provider with Price Rate Interviews Transcripts 00017 00038 00042 00058 00740 13078 Avg. by Provider Accuracy/ $ Ratio CastingWords (CW) ($60/hr per audio) 31.356 9.707 0.154 33.198 17.005 0.881 23.273 14.885 0.822 28.624 15.976 0.814 16.833 11.643 1.996 26.452 14.129 2.119 26.623 13.891 1.131 1.435 Transcription Hub (TH) ($45/hr per audio) 30.233 8.450 0.155 34.628 18.405 1.022 29.129 18.308 1.221 33.433 18.399 1.197 18.071 9.036 2.495 28.874 14.588 2.116 29.061 14.531 1.368 1.899 1-888-Type-It-Up (VF) (avg $125/hr per audio) 28.874 9.524 0.151 26.819 11.051 1.011 18.543 11.175 0.662 23.921 11.658 0.454 12.559 6.212 2.296 24.072 10.977 2.120 22.465 10.099 1.116 0.719 oDesk Worker1 (OD1) ($5.56/hr per work) 31.144 10.510 0.155 29.787 16.884 1.098 30.465 13.697 0.626 15.522 24.281 13.600 2.594 7.777 36.199 20.886 1.678 5.696 oDesk Worker2 (OD2) ($11.11/hr per work) 20.066 12.226 2.591 oDesk Worker3 (OD3) ($13.89/hr per work) Avg. by Interview 34.415 22.545 1.623 30.402 9.548 0.154 31.108 15.836 1.003 37.983 19.228 1.734 26.340 16.728 1.082 30.990 16.315 1.050 28.495 14.973 2.597 16.883 9.779 2.345 26.973 13.667 2.238 28.183 14.451 1.419 16
  • 17. Errors Distribution in WER 3000 Misc Name Alignment PostError Spelling Revision Repetition Filler RefError Partial Background 2500 Errors 2000 1500 1000 500 0 CW OD TH VF 17
  • 18. Hidden Costs • Management costs beyond Base Price – Crowdsourcing studies rarely discuss other costs (other costs dwarf crowd costs…) • CW, TH and VF's price higher than oDesk • But… oDesk: no management cost in the price rate, but additional effort was needed – communicate with workers to negotiate price – clarify requirements, and monitor work – take risk of low quality or late/no delivery 18
  • 19. Contributions • Snapshot in time of current crowdsourcing transcription providers & offerings beyond AMT – Those looking for alternatives today – Retrospective studies • Quantitative WER vs. cost for spontaneous speech transcription across providers • Discussion of tradeoffs among quality, cost, risk & effort in crowdsourcing transcription 19
  • 20. Thank You! Matt Lease ml@utexas.edu Slides: www.slideshare.net/mattlease ir.ischool.utexas.edu