SlideShare a Scribd company logo
1 of 30
What Can Machine Learning & Crowdsourcing
Do for You?
Exploring New Tools for Scalable Data Processing
Matt Lease
School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Slides:
slideshare.net/mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 65 universities around the world
www.ischools.org
What’s an Information School?
2
• Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
Roadmap
Motivation: Applications
@mattlease
Automatic/Hybrid Fact Checking
• http://fcweb.pythonanywhere.com
– Nguyen et al., AAAI 2018
5
• http://odyssey.ischool.utexas.edu/mb/
– Ryu et al., HyperText 2012
MemeBrowser
6
• Kumar et al., CIKM 2011
Dating Biographies without Time Mentions
Plato (428-348 B.C.) Lincoln (1809-1865)
7
Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
Transcription & Copy-Editing
• Spontaneous speech is often disfluent, with repetitions,
corrections, and vocalized space-fillers
• Lease, Charniak, and Johnson, 2005
• Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis)
S1: Uh first um i need to know uh how do you feel about uh about
sending uh an elderly uh family member to a nursing home
S2: Well of course it's you know it's one of the last few things in the
world you'd ever want to do you know unless it's just you know really
you know uh for their uh you know for their own good
Two Problems
@mattlease
Machine Learning - Supervised
Slide courtesy of Byron Wallace (Northeastern)
11
AI effectiveness is often limited by training data size
Problem: creating labeled data is expensive!
Banko and Brill (2001)
What do we do when state-of-art AI
still isn’t good enough?
Crowdsourcing
@mattlease
Crowdsourcing
• Jeff Howe. Wired, June 2006.
• Take a job traditionally
performed by a known agent
(often an employee)
• Outsource it to an undefined,
generally large group of
people via an open call
15
Volunteer Crowd Success Stories
Zooniverse
17
• Marketplace for paid crowd work (“micro-tasks”)
– Created in 2005 (remains in “beta” today)
• On-demand, scalable, 24/7 global workforce
• API lets human labor be integrated into software
– “You’ve heard of software-as-a-service. Now this is human-as-a-service.”
Amazon Mechanical Turk (MTurk)
Collecting Data from Crowds
2008: MTurk sparks “gold rush” for ML training data
• Information Retrieval: Alonso et al., SIGIR Forum
• Human-Computer Interaction: Kittur et al., CHI
• Computer Vision: Sorokin & Forsythe, CVPR
• NLP: Snow et al, EMNLP
– Annotating human language
– 22,000 labels for only US $26
– Crowd’s consensus labels can
replace traditional expert labels
Human Computation
@mattlease
21
ACM Queue, May 2006
22
“Software developers with innovative ideas for
businesses and technologies are constrained by the
limits of artificial intelligence… If software developers
could programmatically access and incorporate human
intelligence into their applications, a whole new class
of innovative businesses and applications would be
possible. This is the goal of Amazon Mechanical Turk…
people are freer to innovate because they can now
imbue software with real human intelligence.”
PlateMate: Counting Calories
Noronha et al., UIST’10
23
Bederson et al., 2010; Morita & Ishidi, 2009
MonoTrans
Translation by Monolingual Speakers + AI
24
Zensors
Laput et al., CSCW 2015
25
But Who Protects the Moderators?
Dang et al., HCOMP’18 & CI’18 26
What about ethics?
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”
• Irani and Silberman (2013)
– “…by hiding workers behind web forms and APIs…
employers see themselves as builders of innovative
technologies, rather than… unconcerned with working
conditions… redirecting focus to the innovation of human
computation as a field of technological achievement.”
• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately
value ethics above cost savings.” 27
Summary
• Machine Learning (AI) lets us automate many
useful tasks, eg. natural language processing (NLP)
• Crowdsourcing enables new levels of efficiency &
scalability in data collection & processing
• Human Computation lets us build next-generation
applications today, with capabilities beyond AI
The Future of Crowd Work
Paper @ CSCW 2013 by
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton 29
Matt Lease - ml@utexas.edu - @mattlease
Thank You!
Slides: slideshare.net/mattlease
Lab: ir.ischool.utexas.edu

More Related Content

What's hot

The Future of work and impact on the technology worker
The Future of work and impact on the technology workerThe Future of work and impact on the technology worker
The Future of work and impact on the technology workerPeter Cosgrove
 
The Impact of Automation & AI in the Workplace
The Impact of Automation & AI in the WorkplaceThe Impact of Automation & AI in the Workplace
The Impact of Automation & AI in the WorkplaceCollabor8now Ltd
 
Data urban service science 20130617 v2
Data urban service science 20130617 v2Data urban service science 20130617 v2
Data urban service science 20130617 v2ISSIP
 
20220103 jim spohrer hicss v9
20220103 jim spohrer hicss v920220103 jim spohrer hicss v9
20220103 jim spohrer hicss v9ISSIP
 
AI & Business - Opportunities & Dangers
AI & Business - Opportunities & DangersAI & Business - Opportunities & Dangers
AI & Business - Opportunities & Dangerswillmurphy
 
People's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformancePeople's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformanceMd. Abul Kalam Siddike
 
Frontiers sutton spohrer 20150711 v2
Frontiers sutton spohrer 20150711 v2Frontiers sutton spohrer 20150711 v2
Frontiers sutton spohrer 20150711 v2ISSIP
 
20211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v520211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v5ISSIP
 
20210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v1320210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v13ISSIP
 
Artificial Intelligence (AI) and Job Loss
Artificial Intelligence (AI) and Job LossArtificial Intelligence (AI) and Job Loss
Artificial Intelligence (AI) and Job LossIkhlaq Sidhu
 
Effects of ai on job market
Effects of ai on job marketEffects of ai on job market
Effects of ai on job marketOmar Ahmed
 
Will robots take our jobs (short version) for Women Techmakers Talk
Will robots take our jobs (short version) for Women Techmakers TalkWill robots take our jobs (short version) for Women Techmakers Talk
Will robots take our jobs (short version) for Women Techmakers TalkAva Meredith
 
Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6ISSIP
 
Smart Machines: Driving the 4th Industrial Revolution?
Smart Machines: Driving the 4th Industrial Revolution?Smart Machines: Driving the 4th Industrial Revolution?
Smart Machines: Driving the 4th Industrial Revolution?Bijilash Babu
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessRussell Miles
 
20210519 jim spohrer sir rel future_ai v14
20210519 jim spohrer sir rel future_ai v1420210519 jim spohrer sir rel future_ai v14
20210519 jim spohrer sir rel future_ai v14ISSIP
 
How Artificial Intelligence is taking over Human Jobs
How Artificial Intelligence is taking over Human JobsHow Artificial Intelligence is taking over Human Jobs
How Artificial Intelligence is taking over Human JobsShradha Jindal
 
Japan 20200724 v13
Japan 20200724 v13Japan 20200724 v13
Japan 20200724 v13ISSIP
 

What's hot (20)

The Future of work and impact on the technology worker
The Future of work and impact on the technology workerThe Future of work and impact on the technology worker
The Future of work and impact on the technology worker
 
The Impact of Automation & AI in the Workplace
The Impact of Automation & AI in the WorkplaceThe Impact of Automation & AI in the Workplace
The Impact of Automation & AI in the Workplace
 
Data urban service science 20130617 v2
Data urban service science 20130617 v2Data urban service science 20130617 v2
Data urban service science 20130617 v2
 
20220103 jim spohrer hicss v9
20220103 jim spohrer hicss v920220103 jim spohrer hicss v9
20220103 jim spohrer hicss v9
 
Skills Requirements for Future Jobs - 10 Facts
Skills Requirements for Future Jobs - 10 FactsSkills Requirements for Future Jobs - 10 Facts
Skills Requirements for Future Jobs - 10 Facts
 
AI & Business - Opportunities & Dangers
AI & Business - Opportunities & DangersAI & Business - Opportunities & Dangers
AI & Business - Opportunities & Dangers
 
People's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformancePeople's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced Performance
 
Frontiers sutton spohrer 20150711 v2
Frontiers sutton spohrer 20150711 v2Frontiers sutton spohrer 20150711 v2
Frontiers sutton spohrer 20150711 v2
 
20211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v520211103 jim spohrer oecd ai_science_productivity_panel v5
20211103 jim spohrer oecd ai_science_productivity_panel v5
 
20210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v1320210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v13
 
Artificial Intelligence (AI) and Job Loss
Artificial Intelligence (AI) and Job LossArtificial Intelligence (AI) and Job Loss
Artificial Intelligence (AI) and Job Loss
 
Effects of ai on job market
Effects of ai on job marketEffects of ai on job market
Effects of ai on job market
 
Will robots take our jobs (short version) for Women Techmakers Talk
Will robots take our jobs (short version) for Women Techmakers TalkWill robots take our jobs (short version) for Women Techmakers Talk
Will robots take our jobs (short version) for Women Techmakers Talk
 
The impact of AI on work
The impact of AI on workThe impact of AI on work
The impact of AI on work
 
Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6
 
Smart Machines: Driving the 4th Industrial Revolution?
Smart Machines: Driving the 4th Industrial Revolution?Smart Machines: Driving the 4th Industrial Revolution?
Smart Machines: Driving the 4th Industrial Revolution?
 
Applying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to BusinessApplying Machine Learning and Artificial Intelligence to Business
Applying Machine Learning and Artificial Intelligence to Business
 
20210519 jim spohrer sir rel future_ai v14
20210519 jim spohrer sir rel future_ai v1420210519 jim spohrer sir rel future_ai v14
20210519 jim spohrer sir rel future_ai v14
 
How Artificial Intelligence is taking over Human Jobs
How Artificial Intelligence is taking over Human JobsHow Artificial Intelligence is taking over Human Jobs
How Artificial Intelligence is taking over Human Jobs
 
Japan 20200724 v13
Japan 20200724 v13Japan 20200724 v13
Japan 20200724 v13
 

Similar to What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing

CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK Project
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkDr. Crispin Coombs
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine LearningArtificial Intelligence and Machine Learning
Artificial Intelligence and Machine LearningMykola Dobrochynskyy
 
Deep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaDeep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaLucidworks
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICS
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICSGENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICS
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICSNITHYA637064
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Agence du Numérique (AdN)
 
Spohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxSpohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxISSIP
 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptxISSIP
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1DianaGray10
 
Art of artificial intelligence and automation
Art of artificial intelligence and automationArt of artificial intelligence and automation
Art of artificial intelligence and automationLiew Wei Da Andrew
 
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by Rajkumar
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by RajkumarWebinar on AI in IoT applications KCG Connect Alumni Digital Series by Rajkumar
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by RajkumarRajkumar R
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
Emotional intelligence and artificial intelligence (A comparative analysis)
Emotional intelligence and artificial intelligence (A comparative analysis)Emotional intelligence and artificial intelligence (A comparative analysis)
Emotional intelligence and artificial intelligence (A comparative analysis)Rumbidzai Faith Matanga
 
Semiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxSemiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxISSIP
 

Similar to What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing (20)

CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service Work
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
Artificial Intelligence and Machine Learning
Artificial Intelligence and Machine LearningArtificial Intelligence and Machine Learning
Artificial Intelligence and Machine Learning
 
Deep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, MilaDeep Learning for AI - Yoshua Bengio, Mila
Deep Learning for AI - Yoshua Bengio, Mila
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICS
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICSGENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICS
GENERATIVE ARTIFICIAL INTELLIGENCE &DATA ANALYTICS
 
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
Présentation de Bruno Schroder au 20e #mforum (07/12/2016)
 
When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!When AI becomes a data-driven machine, and digital is everywhere!
When AI becomes a data-driven machine, and digital is everywhere!
 
Spohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxSpohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptx
 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptx
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
NHH 20231105 v6.pptx
NHH 20231105 v6.pptxNHH 20231105 v6.pptx
NHH 20231105 v6.pptx
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Art of artificial intelligence and automation
Art of artificial intelligence and automationArt of artificial intelligence and automation
Art of artificial intelligence and automation
 
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by Rajkumar
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by RajkumarWebinar on AI in IoT applications KCG Connect Alumni Digital Series by Rajkumar
Webinar on AI in IoT applications KCG Connect Alumni Digital Series by Rajkumar
 
Ai
AiAi
Ai
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
Emotional intelligence and artificial intelligence (A comparative analysis)
Emotional intelligence and artificial intelligence (A comparative analysis)Emotional intelligence and artificial intelligence (A comparative analysis)
Emotional intelligence and artificial intelligence (A comparative analysis)
 
Semiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxSemiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptx
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing

  • 1. What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for Scalable Data Processing Matt Lease School of Information @mattlease University of Texas at Austin ml@utexas.edu Slides: slideshare.net/mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 “iSchools” now exist at 65 universities around the world www.ischools.org What’s an Information School? 2
  • 3. • Machine Learning (AI) lets us automate many useful tasks, eg. natural language processing (NLP) • Crowdsourcing enables new levels of efficiency & scalability in data collection & processing • Human Computation lets us build next-generation applications today, with capabilities beyond AI Roadmap
  • 5. Automatic/Hybrid Fact Checking • http://fcweb.pythonanywhere.com – Nguyen et al., AAAI 2018 5
  • 6. • http://odyssey.ischool.utexas.edu/mb/ – Ryu et al., HyperText 2012 MemeBrowser 6
  • 7. • Kumar et al., CIKM 2011 Dating Biographies without Time Mentions Plato (428-348 B.C.) Lincoln (1809-1865) 7
  • 8. Transcription & Copy-Editing • Spontaneous speech is often disfluent, with repetitions, corrections, and vocalized space-fillers • Lease, Charniak, and Johnson, 2005 • Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis) S1: Uh first um i need to know uh how do you feel about uh about sending uh an elderly uh family member to a nursing home S2: Well of course it's you know it's one of the last few things in the world you'd ever want to do you know unless it's just you know really you know uh for their uh you know for their own good
  • 9. Transcription & Copy-Editing • Spontaneous speech is often disfluent, with repetitions, corrections, and vocalized space-fillers • Lease, Charniak, and Johnson, 2005 • Zhou, Baskov, and Lease, 2013 (& Zhou’s Thesis) S1: Uh first um i need to know uh how do you feel about uh about sending uh an elderly uh family member to a nursing home S2: Well of course it's you know it's one of the last few things in the world you'd ever want to do you know unless it's just you know really you know uh for their uh you know for their own good
  • 11. Machine Learning - Supervised Slide courtesy of Byron Wallace (Northeastern) 11
  • 12. AI effectiveness is often limited by training data size Problem: creating labeled data is expensive! Banko and Brill (2001)
  • 13. What do we do when state-of-art AI still isn’t good enough?
  • 15. Crowdsourcing • Jeff Howe. Wired, June 2006. • Take a job traditionally performed by a known agent (often an employee) • Outsource it to an undefined, generally large group of people via an open call 15
  • 18. • Marketplace for paid crowd work (“micro-tasks”) – Created in 2005 (remains in “beta” today) • On-demand, scalable, 24/7 global workforce • API lets human labor be integrated into software – “You’ve heard of software-as-a-service. Now this is human-as-a-service.” Amazon Mechanical Turk (MTurk)
  • 19. Collecting Data from Crowds 2008: MTurk sparks “gold rush” for ML training data • Information Retrieval: Alonso et al., SIGIR Forum • Human-Computer Interaction: Kittur et al., CHI • Computer Vision: Sorokin & Forsythe, CVPR • NLP: Snow et al, EMNLP – Annotating human language – 22,000 labels for only US $26 – Crowd’s consensus labels can replace traditional expert labels
  • 21. 21
  • 22. ACM Queue, May 2006 22 “Software developers with innovative ideas for businesses and technologies are constrained by the limits of artificial intelligence… If software developers could programmatically access and incorporate human intelligence into their applications, a whole new class of innovative businesses and applications would be possible. This is the goal of Amazon Mechanical Turk… people are freer to innovate because they can now imbue software with real human intelligence.”
  • 23. PlateMate: Counting Calories Noronha et al., UIST’10 23
  • 24. Bederson et al., 2010; Morita & Ishidi, 2009 MonoTrans Translation by Monolingual Speakers + AI 24
  • 25. Zensors Laput et al., CSCW 2015 25
  • 26. But Who Protects the Moderators? Dang et al., HCOMP’18 & CI’18 26
  • 27. What about ethics? • Silberman, Irani, and Ross (2010) – “How should we… conceptualize the role of these people who we ask to power our computing?” • Irani and Silberman (2013) – “…by hiding workers behind web forms and APIs… employers see themselves as builders of innovative technologies, rather than… unconcerned with working conditions… redirecting focus to the innovation of human computation as a field of technological achievement.” • Fort, Adda, and Cohen (2011) – “…opportunities for our community to deliberately value ethics above cost savings.” 27
  • 28. Summary • Machine Learning (AI) lets us automate many useful tasks, eg. natural language processing (NLP) • Crowdsourcing enables new levels of efficiency & scalability in data collection & processing • Human Computation lets us build next-generation applications today, with capabilities beyond AI
  • 29. The Future of Crowd Work Paper @ CSCW 2013 by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 29
  • 30. Matt Lease - ml@utexas.edu - @mattlease Thank You! Slides: slideshare.net/mattlease Lab: ir.ischool.utexas.edu