SlideShare une entreprise Scribd logo
1  sur  66
Crowdsourced Data Processing:
Industry and Academic Perspectives
Adam Marcus and
Aditya Parameswaran
1
ATutorial inThree Parts
Part 0: A (Super Short) Survey of Part 1 and 2, plus
Background (Me)
Part 0.1: Background + Survey of Part 1
Part 0.2: Survey of Part 2
Part 1: A Survey of Crowd-Powered Data
Processing in Academia (Me)
Part 2: A Survey of Crowd-Powered Data
Processing in Industry (Adam)
2
Part 0.1
(background and
survey of Part 1)
3
What is crowdsourcing?
• Our definition: [Von Ahn]
Crowdsourcing is a paradigm that
utilizes human processing power
to solve problems that computers
cannot yet solve.
e.g., processing and
understanding images, videos,
and text.
(80% or more of all data – a 5
year old IBM study)
4
Items, e.g.,
images, text
Why is it important?
We’re on the cusp of an AI
revolution [NYT, July’16]:
– “a transformation many believe
will have a payoff on the scale of
… personal computing … or the
internet”
AI requires large volumes of
training data.
Our best hope of understanding
images, videos, and text, comes
from humans
5
How does one deploy crowdsourcing?
• Our focus: paid crowdsourcing
– Other ways: volunteer, gaming
– “paid” is broad: $$, pigs on your farm, MBs, bitcoin, …
• A typical paid platform:
– Requesters put jobs up, assign rewards
– Workers pick up and work on these jobs, get rewards
6
Our Focus: Data, Data, Data
How do we get crowds to process large volumes
of data efficiently and effectively
– Design of algorithms
– Design of systems
We call this “crowdsourced data processing”
This is the primary concern of industry users.
7
Context: OtherWork
8
Crowdsourced data processing depends on many
other fields …. (but not the focus of this tutorial)
Humans = Data Processors
Our abstraction: Humans are Data Processors
• compare two items
• rate an item
• evaluate a predicate on an item
Human operator set is not fully known or understood!
9
10
Boolean
Question
11
K-Ary
Question
But: Unlike Computer Processors
So, algorithm
development has to
be done “ab initio”
Latency
Cost
Quality
How much am I
willing to spend?
How long can I wait?
What is my desired
quality?
12
… Humans cost money, take time, and make mistakes
Illustration of Challenges: Sorting
Sort n animals on “dangerousness”
• Option 1: give it all to one human worker – could
take very long, likely error prone.
• Option 2: apply a sorting algorithm, with pairwise
comparisons being done by humans instead of
automatically
13
< <
Illustration of Challenges: Sorting
• Option 2: But:
– Workers may make mistakes! So how do you
know if you can trust a worker response?
– Cycles may form
– Should we get more worker answers for the same
pair or for different pairs? 14
< <
>
<
>
Also: Interfaces
Comparison
more
dangerous?
how
dangerous?
Rating
15
Overall: Challenges
16
•Which questions do I ask of humans?
• Do I ask sequentially or in parallel?
• How much redundancy in questions?
• How do I combine answers?
•When do I stop?
In the longer part of this talk …
• A recipe for crowdsourced algorithm design
– What all do you need to keep into account
– Plus a couple of examples
17
Next Part: Systems
• Wouldn’t it be nice if you could just “say” what
you wanted gathered or processed, and have
the system do it for you?
– Akin to database systems
– Database systems have a query language: SQL
• Here are some examples
18
Get/Process data
Crowdsourced Data Processing Systems
Country Capital Language
Peru Lima Spanish
Peru Lima Quechua
Brazil Brasilia Portugues
e
… … …
Find the capitals of five
Spanish-speaking countries
System
Give me a Spanish-speaking
country
What language do they speak in
country X?
What is the capital of country X?
Give me a valid <Country, Capital,
Language> combination
Gathering
more data
Processing
(Filtering)
19
Country Capital Language
Peru Lima Spanish
Peru Lima Quechua
Brazil Brasilia Portuguese
… … …
Find the capitals of five
Spanish-speaking countries
System
• What if some humans say Brazil
is Spanish-speaking and others
say Portuguese?
•What if some humans answer
“Chile” and others “Chili”?
Inconsistencies
Crowdsourced Data Processing Systems
One specific issue…
20
What are the challenges?
• What is the query language for expressing stuff
like this?
• How is it optimized?
• How does it mesh with existing data?
• How does it deal with the latency of the crowd,
etc.
More on how different systems solve these
challenges later on
21
ATutorial inThree Parts
Part 0: A (Super Short) Survey of Part 1 and 2, plus
Background (Me)
Part 0.1: Background + Survey of Part 1
Part 0.2: Survey of Part 2
Part 1: A Survey of Crowd-Powered Data
Processing in Academia (Me)
Part 2: A Survey of Crowd-Powered Data
Processing in Industry (Adam)
22
Part 0.2
(survey of Part 2)
23
The Industry Perspective
Circa 2013:
– HCOMP becomes a real conference
Crowdsourcing now an academic discipline
– Industry folks at HCOMP claiming:
• “Crowdsourcing is still a dark art…”
• “We use crowdsourcing at scale… but...”
• “Academics are not solving real problems…”
Problem: No one had really chronicled the use of
crowdsourcing in industry.


24
What happened?
Adam and I spoke to 13 large scale users of crowds + 4
marketplace vendors to identify:
– scale, use-cases, status-quo
– challenges, pain-points
Tried to bridge the gap between industry and academia
Crowdsourced Data Management: Industry and Academic Perspectives,
Foundations andTrends in DB Series, 2015
25
Qualitative Study:
Who did we talk to?
Team issued a large # of
categorization tasks/week.
Data extraction
from images
Go-to team for
crowdsourcing
26
Shocker I: Internal Platforms
Five of the largest cos. we spoke to primarily use
their own “internal” or “in-house” platforms:
– Workers typically hired via an outsourcing firm
– Working 9—5 on this company’s tasks
– May be due to:
• Fine-grained tracking, hiring, leaderboards
• Data of a sensitive nature
• Economies of scale
What we’re seeing is a drop in the bucket.
27
Shocker II: Scale
Most companies use crowdsourcing @ scale
• One reported 50+ employees
just to manage their internal
marketplace
• Another issues 0.5M tasks/week
• Another has an internal crowdsourcing user mailing
list with hundreds of employees
Most large firms spend Ms  10s of Ms/ year, and a
comparable amount administering internal mktplaces.
28
Shocker II: Scale (Continued)
Why the scale?
– AI eating the world: where there’s a model
there’s a need for training data
– Moving target: need for fresh training data as the
problem constantly evolves
– More data beats better models: models trained
are more general, less over fit, …
29
Shocker III: Academic work is not used (yet)!
• Quality assurance: almost all use majority vote;
<50% use fancy stuff.
– <25% use active learning!
• Workflows: most workflows are single step
– “In my experience, if you need multiple steps of
crowdsourcing, it’s almost always more productive to
go back and do a bit more automation upfront.”
• Frameworks: no use of crowdsourced data proc
systems, APIs/frameworks
30
Other Findings
• Design is super hard
– Many iterations to get to the “right” task
– Some actively use A/B testing between task types
• Top-3 benefits of crowds:
– flexible scaling, low cost, enabled previously
difficult tasks.
– “It’s easier to justify money for crowds than another
employee”
31
Other Findings: Use Cases
1. Categorization
2. Content Moderation
3. Entity Resolution
4. Relevance
5. Data Cleaning
6. Data Extraction
7. Text Generation 32
MajorTakeaways
Shockers
I. Understudied paradigm:
“Internal” Marketplaces
II. @ scale – need to shout
from the rooftops!
III. Academic stuff isn’t
used much (yet)
OtherTakeaways
I. Academia is working on
the (~) right problems!
II. Crowds admit flexibility
in companies w/o politics
III. Design is super
challenging!
33
What else?
• Sizes of teams, scale, throughput
• Recruiting, retention
• Use cases
• Quality assurance
• Task design and decomposition
• Prior approaches, benefits of crowdsourcing
• Incentivization
Lots of good stuff coming up in Adam’s Part 2!
34
ATutorial inThree Parts
Part 0: A (Super Short) Survey of Part 1 and 2, plus
Background (Me)
Part 0.1: Background + Survey of Part 1
Part 0.2: Survey of Part 2
Part 1: A Survey of Crowd-Powered Data
Management in Academia (Me)
Part 2: A Survey of Crowd-Powered Data
Management in Industry (Adam)
35
Part 1
36
Data Processing Algorithms
Humans are Data
Processors
How do we design
algorithms using
human operators?
37
Latency
Cost
Quality
How much am I
willing to spend?
How long can I wait?
What is my desired
quality?
Crowdsourced Data Processing
Marketplace #1 Marketplace #2 Marketplace #n
…
…
• Interfaces
• Incentives
•Trust, reputation
• Spam
• Pricing
Plumbing
Algorithms
Basic ops
• Compare
• Filter
38
Data Processing Algorithms
• Sorting, Max,Top-K
• Filtering, Rating, Finding
• Entity Resolution, Clustering, Joins
• Categorization
• Gathering, Extracting
• Counting
• …
50-odd papers in this space!
@VLDB, SIGMOD, ICDE, …
39
Algorithm
Items
Crowdsourcing
Marketplace
Algorithm Flow
A. Error Model
B. Latency
Model
Current
Answers
Preparation
I. Unit Operations
II. Cost Model
III. Objectives
40
Algorithm Design Recipe
• Explicit Choices:
– Unit Operations
– Cost Model
– Objectives
• Assumptions:
– Error Model
– Latency Model
Illustration:
My paper “Crowdscreen: Algorithms for Filtering…”,SIGMOD 12, AKA Filtering
• Given a dataset of images, find those that don’t show inappropriate content
Adam’s paper “Crowd-Powered Sorts and Joins”,VLDB 11, AKA Sorting
• Given a dataset of animal images, sort them in increasing “dangerousness”
41
Explicit Choice: Unit Operations
What sorts of input can we get from human workers?
• Simple vs. complex:
– Simpler = are easier to analyze, easier to “aggregate” and assign
correctness to.
– Complex = help us get more fine-grained, open-ended data.
• Number of types:
– One type is simpler to analyze and aggregate than two.
Most work ends up picking a small number of simple operations
Filtering: filter an item
Sorting: compare two items, or rate an item
42
Explicit Choice: Cost Model
43
How do we set the reward for each unit operation?
Cost can depend on:
• Type of operation
• Type of item
• Number of items
Typical rule of thumb – time the operation, pay using minimum wage
 Simple assumption: same cost for each operation
Filtering: c(filter an item) constant
Sorting: c(compare two items) = c(rate an item)
Explicit Choice: Objectives
What do we optimize for?
Care about cost, latency, quality.
• Bound one (or two), optimize others
Typically bound on cost, maximize quality
Sometimes bound on quality, minimize cost
Filtering: bound on quality, minimize cost
Sorting: bound on cost, maximize quality
44
Assumption: Error Model
How do we model human accuracies?
All models are wrong, but can still be useful.
• Simplest model: no errors!
– Similar: ask fixed # of workers, then no error
– Same error probability per worker (Filtering)
• Each worker has a fixed error probability
• Each worker has a error probability dependent on item
• No assumptions about error – just get something that
works well (Sorting)
Opt for what can be analyzed – simple is good
This is a bit of an “art” – may require iterations 45
Placing it all together: Filtering
• Goal: filter a set of items on some property X; i.e., find
all items that satisfy X
• Operation: ask a person “does this item satisfy the
filter or not?”
• Cost model: all operations cost the same
• Objective: accuracy across all items is fixed (alpha,
e.g., 95%), minimize cost
• Error model: people make mistakes with a fixed
probability (beta, e.g., 5%)
46
Dataset
of Items
Boolean
Predicate
Filtered
Dataset
Does this image
show an animal?
54321
5
4
3
2
1
Yes
No
OurVisualization of Strategies
decide PASS
continue
decide FAIL
Markov
Decision
Process
47
54321
5
4
3
2
1
Yes
No
Evaluating Strategies
decide PASS
continue
decide FAIL
Pr. [reach (4, 2)] =
Pr. [reach (4, 1) & get a No]+
Pr. [reach (3, 2) & get aYes]
Cost = (x+y) Pr [reach(x,y)]
Error = Pr [reach ∧1] +
Pr [reach ∧0]
∑
∑
∑
y
x 48
Naïve Approach
For all strategies:
• Evaluate cost & error
Return the best
O(3g), g = O(m2)
This is obviously bad.
Paper has probabilistic methods that
identify optimal strategies (LP)
54321
5
4
3
2
1
Yes
No
For each grid point
Assign , or
49
Placing it all together: Sorting
• Goal: sort a set of items on some property X
• Operation: ask a person “is A better than B on
property X”, or “rate A on property X”
• Cost model: all operations cost the same
• Objective: total cost is fixed, maximize accuracy
• Error model: more ad-hoc; no fixed assumption
50
Dataset
of Items
Sort on
Predicate
Sorted
Dataset
Sort animals on
dangerousness
• Completely Comparison-Based
– Accuracy = 1 (completely accurate)
– O(# items2)
• Completely Rating-Based
– Accuracy ≈ 0.8 (accurate)
– O(# items)
Placing it all together: Sorting
51
• First, gather a bunch of ratings
• Order based on average ratings
• Then, use comparisons, in one of three
flavors:
– Random: pick S items, compare
– Confidence-based: pick most confusing
“window”, compare that first, repeat
– Sliding-window: for all windows, compare
 the best
Placing it all together: Sorting
52
0.8
0.85
0.9
0.95
1
0 20 40 60 80
Accuracy
# Tasks
Compare Rate 53
0.8
0.85
0.9
0.95
1
0 20 40 60 80
Accuracy
# Tasks
Hybrid Compare Rate 54
Crowdsourced Data Processing
Marketplace #1 Marketplace #2 Marketplace #n
…
…
• Interfaces
• Incentives
•Trust, reputation
• Spam
• Pricing
Plumbing
Algorithms
Systems
Basic ops
• Compare
• Filter
Complex ops
• Sort
• Cluster
• Clean
• Get data
•Verify
55
Data Processing Systems
Declarative Crowdsourcing Systems:
Qurk (MIT), Deco (Stanford/UCSC), CrowdDB (Berkeley)
Treat crowds as just another “access method” for the database
• Fetch data from disk, the web, … , the crowd
• Not just process data, but also gather data.
56
ThereAre Other Systems…
• Toolkits
– Turkit, Automan
– Crowds = “API calls”
– Little to no optimization
• Imperative Systems
– Jabberwocky, CrowdForge
– Crowds = “Data Processing Units”
– Programmer dictated flow, limited optimization within the units
• Declarative Systems
– Deco, Qurk, CrowdDB
– Crowds = “Data Processing Units”
– Programmer specifies goal, optimized across the spectrum
57
Analogous to ProgrammingAPIs
Analogous to Pig or Map-Reduce
Analogous to Relational Databases
IncreasingDeclarativity
Why is DeclarativeGood?
• Take away repeatable code and redundancy
• Lack of manual optimization
• Less cumbersome to specify
58
What does one need? (SimpleVersion)
1. A Mechanism to “Store”/”Represent” Data
2. A Mechanism to “Get” More Data
3. A Mechanism to “Fix” Existing Data
4. A “Query” Language
Two prototypical systems:
Deco: an end-to-end redesign
Qurk: a small modification to existing databases
59
name capitalname capital
Peru Lima
France Nice
France Paris
France Paris
60
name name language
name language capital
name
Peru
France
fetch rule
φname
name language
Peru Spanish
Peru Spanish
France French
fetch rule
name language fetch rule
namecapital
fetch rule
languagename
name
Peru
France
name language
Peru Spanish
France French
name capital
Peru Lima
France Paris
resolution rule
name language capital
Peru Spanish Lima
France French Paris
⋈
o
resolution rule
User
view
Raw
Tables
A D1 D2
Deco: Declarative Crowdsourcing
DBMS
1) Representation scheme:
Countries(name,lang,capital)
2) “Get” more data: fetch rules
name  capital
capital,lang  name
3) “Fix” data: resolution rules
lang: dedup()
capital: majority()
4) Declarative queries
select name from Countries
where language = ‘Spanish’
atleast 5
User or Application
61
Qurk
• A regular old database
• Human processing/gathering as UDFs
– User-defined functions
– Commonly also used by relational databases to
capture operations outside relational algebra
– Typically external API calls
62
Qurk filter: inappropriate content
photos(id PRIMARY KEY, picture IMAGE)
Query =SELECT * FROM photos WHERE
isSmiling(photos.picture);
UDF
1) Representation scheme:
UDFs are “pre-declared”
2) “Get” more data:
UDFs translate into one/more fixed task types
3) “Fix” data:
UDF internally handle quality assurance
4) “Query”
SQL + UDF 63
ATutorial inThree Parts
Part 0: A (Super Short) Survey of Part 1 and 2, plus
Background (Me)
Part 0.1: Background + Survey of Part 1
Part 0.2: Survey of Part 2
Part 1: A Survey of Crowd-Powered Data
Management in Academia (Me)
Part 2: A Survey of Crowd-Powered Data
Management in Industry (Adam)  UP NEXT
64
A little about me
Assistant prof at Illinois since 2014
Thesis work on crowdsourced data processing
Now work on Human-in-the-loop data analytics (HILDA)
Twitter: @adityagp
Homepage: http://data-people.cs.illinois.edu 65
Understand
Visualize
Manipulate
Collaborate
http://populace-org.github.io
http://orpheus-db.github.io
http://zenvisage.github.io
http://dataspread.github.io
66

Contenu connexe

Tendances

Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Session 04 communicating results
Session 04 communicating resultsSession 04 communicating results
Session 04 communicating resultsbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)heba_ahmad
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger databodaceacat
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsChandan Rajah
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopCosmoAIMS Bassett
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 

Tendances (20)

Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Session 04 communicating results
Session 04 communicating resultsSession 04 communicating results
Session 04 communicating results
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Session 10 handling bigger data
Session 10 handling bigger dataSession 10 handling bigger data
Session 10 handling bigger data
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 

Similaire à Crowdsourced Data Processing: Industry and Academic Perspectives

Artificial Intelligence(A.pptx
Artificial Intelligence(A.pptxArtificial Intelligence(A.pptx
Artificial Intelligence(A.pptxYukthiRajSN
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
 
Week 1 bua 235
Week 1 bua 235Week 1 bua 235
Week 1 bua 235UMaine
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceAbhishek Upadhyay
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a professionJose Quesada
 
AI Orange Belt - Session 2
AI Orange Belt - Session 2AI Orange Belt - Session 2
AI Orange Belt - Session 2AI Black Belt
 
Artificial Intelligence and The Complexity
Artificial Intelligence and The ComplexityArtificial Intelligence and The Complexity
Artificial Intelligence and The ComplexityHendri Karisma
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressMarcel Blattner, PhD
 
Bit by Bit: A Framework for Building Technological Competence as a Lawyer
Bit by Bit: A Framework for Building Technological Competence as a LawyerBit by Bit: A Framework for Building Technological Competence as a Lawyer
Bit by Bit: A Framework for Building Technological Competence as a LawyerJack Pringle
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7CS, NcState
 
Data Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsData Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsLeadGenius
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...DataScienceConferenc1
 
Aect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedAect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedIsa Jahnke
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantLynne Thomas
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)Julien SIMON
 
ChatGPT and Stress (PPT Presenatation Slides).pptx
ChatGPT and Stress (PPT Presenatation Slides).pptxChatGPT and Stress (PPT Presenatation Slides).pptx
ChatGPT and Stress (PPT Presenatation Slides).pptxKamonmarlPolyotha
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 

Similaire à Crowdsourced Data Processing: Industry and Academic Perspectives (20)

Artificial Intelligence(A.pptx
Artificial Intelligence(A.pptxArtificial Intelligence(A.pptx
Artificial Intelligence(A.pptx
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
Week 1 bua 235
Week 1 bua 235Week 1 bua 235
Week 1 bua 235
 
Artificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of IntelligenceArtificial intelligence: Simulation of Intelligence
Artificial intelligence: Simulation of Intelligence
 
Future of data science as a profession
Future of data science as a professionFuture of data science as a profession
Future of data science as a profession
 
AI Orange Belt - Session 2
AI Orange Belt - Session 2AI Orange Belt - Session 2
AI Orange Belt - Session 2
 
Artificial Intelligence and The Complexity
Artificial Intelligence and The ComplexityArtificial Intelligence and The Complexity
Artificial Intelligence and The Complexity
 
Big Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR CongressBig Data and HR - Talk @SwissHR Congress
Big Data and HR - Talk @SwissHR Congress
 
Bit by Bit: A Framework for Building Technological Competence as a Lawyer
Bit by Bit: A Framework for Building Technological Competence as a LawyerBit by Bit: A Framework for Building Technological Competence as a Lawyer
Bit by Bit: A Framework for Building Technological Competence as a Lawyer
 
Dm sei-tutorial-v7
Dm sei-tutorial-v7Dm sei-tutorial-v7
Dm sei-tutorial-v7
 
Data Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and SellsData Driven Sales: Building AI That Searches, Learns, and Sells
Data Driven Sales: Building AI That Searches, Learns, and Sells
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
[DSC Europe 22] On the Aspects of Artificial Intelligence and Robotic Autonom...
 
Aect 2018 workshop
Aect 2018 workshopAect 2018 workshop
Aect 2018 workshop
 
Aect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressedAect2018 workshop-v6ij-compressed
Aect2018 workshop-v6ij-compressed
 
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership GrantPOWRR Tools: Lessons learned from an IMLS National Leadership Grant
POWRR Tools: Lessons learned from an IMLS National Leadership Grant
 
The Future of AI (September 2019)
The Future of AI (September 2019)The Future of AI (September 2019)
The Future of AI (September 2019)
 
ChatGPT and Stress (PPT Presenatation Slides).pptx
ChatGPT and Stress (PPT Presenatation Slides).pptxChatGPT and Stress (PPT Presenatation Slides).pptx
ChatGPT and Stress (PPT Presenatation Slides).pptx
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 

Dernier

Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...SUHANI PANDEY
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...SUHANI PANDEY
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...SUHANI PANDEY
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLimonikaupta
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Delhi Call girls
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...nirzagarg
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...nilamkumrai
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 

Dernier (20)

Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
Wadgaon Sheri $ Call Girls Pune 10k @ I'm VIP Independent Escorts Girls 80057...
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
Pirangut | Call Girls Pune Phone No 8005736733 Elite Escort Service Available...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
📱Dehradun Call Girls Service 📱☎️ +91'905,3900,678 ☎️📱 Call Girls In Dehradun 📱
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 

Crowdsourced Data Processing: Industry and Academic Perspectives

  • 1. Crowdsourced Data Processing: Industry and Academic Perspectives Adam Marcus and Aditya Parameswaran 1
  • 2. ATutorial inThree Parts Part 0: A (Super Short) Survey of Part 1 and 2, plus Background (Me) Part 0.1: Background + Survey of Part 1 Part 0.2: Survey of Part 2 Part 1: A Survey of Crowd-Powered Data Processing in Academia (Me) Part 2: A Survey of Crowd-Powered Data Processing in Industry (Adam) 2
  • 4. What is crowdsourcing? • Our definition: [Von Ahn] Crowdsourcing is a paradigm that utilizes human processing power to solve problems that computers cannot yet solve. e.g., processing and understanding images, videos, and text. (80% or more of all data – a 5 year old IBM study) 4 Items, e.g., images, text
  • 5. Why is it important? We’re on the cusp of an AI revolution [NYT, July’16]: – “a transformation many believe will have a payoff on the scale of … personal computing … or the internet” AI requires large volumes of training data. Our best hope of understanding images, videos, and text, comes from humans 5
  • 6. How does one deploy crowdsourcing? • Our focus: paid crowdsourcing – Other ways: volunteer, gaming – “paid” is broad: $$, pigs on your farm, MBs, bitcoin, … • A typical paid platform: – Requesters put jobs up, assign rewards – Workers pick up and work on these jobs, get rewards 6
  • 7. Our Focus: Data, Data, Data How do we get crowds to process large volumes of data efficiently and effectively – Design of algorithms – Design of systems We call this “crowdsourced data processing” This is the primary concern of industry users. 7
  • 8. Context: OtherWork 8 Crowdsourced data processing depends on many other fields …. (but not the focus of this tutorial)
  • 9. Humans = Data Processors Our abstraction: Humans are Data Processors • compare two items • rate an item • evaluate a predicate on an item Human operator set is not fully known or understood! 9
  • 12. But: Unlike Computer Processors So, algorithm development has to be done “ab initio” Latency Cost Quality How much am I willing to spend? How long can I wait? What is my desired quality? 12 … Humans cost money, take time, and make mistakes
  • 13. Illustration of Challenges: Sorting Sort n animals on “dangerousness” • Option 1: give it all to one human worker – could take very long, likely error prone. • Option 2: apply a sorting algorithm, with pairwise comparisons being done by humans instead of automatically 13 < <
  • 14. Illustration of Challenges: Sorting • Option 2: But: – Workers may make mistakes! So how do you know if you can trust a worker response? – Cycles may form – Should we get more worker answers for the same pair or for different pairs? 14 < < > < >
  • 16. Overall: Challenges 16 •Which questions do I ask of humans? • Do I ask sequentially or in parallel? • How much redundancy in questions? • How do I combine answers? •When do I stop?
  • 17. In the longer part of this talk … • A recipe for crowdsourced algorithm design – What all do you need to keep into account – Plus a couple of examples 17
  • 18. Next Part: Systems • Wouldn’t it be nice if you could just “say” what you wanted gathered or processed, and have the system do it for you? – Akin to database systems – Database systems have a query language: SQL • Here are some examples 18
  • 19. Get/Process data Crowdsourced Data Processing Systems Country Capital Language Peru Lima Spanish Peru Lima Quechua Brazil Brasilia Portugues e … … … Find the capitals of five Spanish-speaking countries System Give me a Spanish-speaking country What language do they speak in country X? What is the capital of country X? Give me a valid <Country, Capital, Language> combination Gathering more data Processing (Filtering) 19
  • 20. Country Capital Language Peru Lima Spanish Peru Lima Quechua Brazil Brasilia Portuguese … … … Find the capitals of five Spanish-speaking countries System • What if some humans say Brazil is Spanish-speaking and others say Portuguese? •What if some humans answer “Chile” and others “Chili”? Inconsistencies Crowdsourced Data Processing Systems One specific issue… 20
  • 21. What are the challenges? • What is the query language for expressing stuff like this? • How is it optimized? • How does it mesh with existing data? • How does it deal with the latency of the crowd, etc. More on how different systems solve these challenges later on 21
  • 22. ATutorial inThree Parts Part 0: A (Super Short) Survey of Part 1 and 2, plus Background (Me) Part 0.1: Background + Survey of Part 1 Part 0.2: Survey of Part 2 Part 1: A Survey of Crowd-Powered Data Processing in Academia (Me) Part 2: A Survey of Crowd-Powered Data Processing in Industry (Adam) 22
  • 23. Part 0.2 (survey of Part 2) 23
  • 24. The Industry Perspective Circa 2013: – HCOMP becomes a real conference Crowdsourcing now an academic discipline – Industry folks at HCOMP claiming: • “Crowdsourcing is still a dark art…” • “We use crowdsourcing at scale… but...” • “Academics are not solving real problems…” Problem: No one had really chronicled the use of crowdsourcing in industry.   24
  • 25. What happened? Adam and I spoke to 13 large scale users of crowds + 4 marketplace vendors to identify: – scale, use-cases, status-quo – challenges, pain-points Tried to bridge the gap between industry and academia Crowdsourced Data Management: Industry and Academic Perspectives, Foundations andTrends in DB Series, 2015 25
  • 26. Qualitative Study: Who did we talk to? Team issued a large # of categorization tasks/week. Data extraction from images Go-to team for crowdsourcing 26
  • 27. Shocker I: Internal Platforms Five of the largest cos. we spoke to primarily use their own “internal” or “in-house” platforms: – Workers typically hired via an outsourcing firm – Working 9—5 on this company’s tasks – May be due to: • Fine-grained tracking, hiring, leaderboards • Data of a sensitive nature • Economies of scale What we’re seeing is a drop in the bucket. 27
  • 28. Shocker II: Scale Most companies use crowdsourcing @ scale • One reported 50+ employees just to manage their internal marketplace • Another issues 0.5M tasks/week • Another has an internal crowdsourcing user mailing list with hundreds of employees Most large firms spend Ms  10s of Ms/ year, and a comparable amount administering internal mktplaces. 28
  • 29. Shocker II: Scale (Continued) Why the scale? – AI eating the world: where there’s a model there’s a need for training data – Moving target: need for fresh training data as the problem constantly evolves – More data beats better models: models trained are more general, less over fit, … 29
  • 30. Shocker III: Academic work is not used (yet)! • Quality assurance: almost all use majority vote; <50% use fancy stuff. – <25% use active learning! • Workflows: most workflows are single step – “In my experience, if you need multiple steps of crowdsourcing, it’s almost always more productive to go back and do a bit more automation upfront.” • Frameworks: no use of crowdsourced data proc systems, APIs/frameworks 30
  • 31. Other Findings • Design is super hard – Many iterations to get to the “right” task – Some actively use A/B testing between task types • Top-3 benefits of crowds: – flexible scaling, low cost, enabled previously difficult tasks. – “It’s easier to justify money for crowds than another employee” 31
  • 32. Other Findings: Use Cases 1. Categorization 2. Content Moderation 3. Entity Resolution 4. Relevance 5. Data Cleaning 6. Data Extraction 7. Text Generation 32
  • 33. MajorTakeaways Shockers I. Understudied paradigm: “Internal” Marketplaces II. @ scale – need to shout from the rooftops! III. Academic stuff isn’t used much (yet) OtherTakeaways I. Academia is working on the (~) right problems! II. Crowds admit flexibility in companies w/o politics III. Design is super challenging! 33
  • 34. What else? • Sizes of teams, scale, throughput • Recruiting, retention • Use cases • Quality assurance • Task design and decomposition • Prior approaches, benefits of crowdsourcing • Incentivization Lots of good stuff coming up in Adam’s Part 2! 34
  • 35. ATutorial inThree Parts Part 0: A (Super Short) Survey of Part 1 and 2, plus Background (Me) Part 0.1: Background + Survey of Part 1 Part 0.2: Survey of Part 2 Part 1: A Survey of Crowd-Powered Data Management in Academia (Me) Part 2: A Survey of Crowd-Powered Data Management in Industry (Adam) 35
  • 37. Data Processing Algorithms Humans are Data Processors How do we design algorithms using human operators? 37 Latency Cost Quality How much am I willing to spend? How long can I wait? What is my desired quality?
  • 38. Crowdsourced Data Processing Marketplace #1 Marketplace #2 Marketplace #n … … • Interfaces • Incentives •Trust, reputation • Spam • Pricing Plumbing Algorithms Basic ops • Compare • Filter 38
  • 39. Data Processing Algorithms • Sorting, Max,Top-K • Filtering, Rating, Finding • Entity Resolution, Clustering, Joins • Categorization • Gathering, Extracting • Counting • … 50-odd papers in this space! @VLDB, SIGMOD, ICDE, … 39
  • 40. Algorithm Items Crowdsourcing Marketplace Algorithm Flow A. Error Model B. Latency Model Current Answers Preparation I. Unit Operations II. Cost Model III. Objectives 40
  • 41. Algorithm Design Recipe • Explicit Choices: – Unit Operations – Cost Model – Objectives • Assumptions: – Error Model – Latency Model Illustration: My paper “Crowdscreen: Algorithms for Filtering…”,SIGMOD 12, AKA Filtering • Given a dataset of images, find those that don’t show inappropriate content Adam’s paper “Crowd-Powered Sorts and Joins”,VLDB 11, AKA Sorting • Given a dataset of animal images, sort them in increasing “dangerousness” 41
  • 42. Explicit Choice: Unit Operations What sorts of input can we get from human workers? • Simple vs. complex: – Simpler = are easier to analyze, easier to “aggregate” and assign correctness to. – Complex = help us get more fine-grained, open-ended data. • Number of types: – One type is simpler to analyze and aggregate than two. Most work ends up picking a small number of simple operations Filtering: filter an item Sorting: compare two items, or rate an item 42
  • 43. Explicit Choice: Cost Model 43 How do we set the reward for each unit operation? Cost can depend on: • Type of operation • Type of item • Number of items Typical rule of thumb – time the operation, pay using minimum wage  Simple assumption: same cost for each operation Filtering: c(filter an item) constant Sorting: c(compare two items) = c(rate an item)
  • 44. Explicit Choice: Objectives What do we optimize for? Care about cost, latency, quality. • Bound one (or two), optimize others Typically bound on cost, maximize quality Sometimes bound on quality, minimize cost Filtering: bound on quality, minimize cost Sorting: bound on cost, maximize quality 44
  • 45. Assumption: Error Model How do we model human accuracies? All models are wrong, but can still be useful. • Simplest model: no errors! – Similar: ask fixed # of workers, then no error – Same error probability per worker (Filtering) • Each worker has a fixed error probability • Each worker has a error probability dependent on item • No assumptions about error – just get something that works well (Sorting) Opt for what can be analyzed – simple is good This is a bit of an “art” – may require iterations 45
  • 46. Placing it all together: Filtering • Goal: filter a set of items on some property X; i.e., find all items that satisfy X • Operation: ask a person “does this item satisfy the filter or not?” • Cost model: all operations cost the same • Objective: accuracy across all items is fixed (alpha, e.g., 95%), minimize cost • Error model: people make mistakes with a fixed probability (beta, e.g., 5%) 46 Dataset of Items Boolean Predicate Filtered Dataset Does this image show an animal?
  • 47. 54321 5 4 3 2 1 Yes No OurVisualization of Strategies decide PASS continue decide FAIL Markov Decision Process 47
  • 48. 54321 5 4 3 2 1 Yes No Evaluating Strategies decide PASS continue decide FAIL Pr. [reach (4, 2)] = Pr. [reach (4, 1) & get a No]+ Pr. [reach (3, 2) & get aYes] Cost = (x+y) Pr [reach(x,y)] Error = Pr [reach ∧1] + Pr [reach ∧0] ∑ ∑ ∑ y x 48
  • 49. Naïve Approach For all strategies: • Evaluate cost & error Return the best O(3g), g = O(m2) This is obviously bad. Paper has probabilistic methods that identify optimal strategies (LP) 54321 5 4 3 2 1 Yes No For each grid point Assign , or 49
  • 50. Placing it all together: Sorting • Goal: sort a set of items on some property X • Operation: ask a person “is A better than B on property X”, or “rate A on property X” • Cost model: all operations cost the same • Objective: total cost is fixed, maximize accuracy • Error model: more ad-hoc; no fixed assumption 50 Dataset of Items Sort on Predicate Sorted Dataset Sort animals on dangerousness
  • 51. • Completely Comparison-Based – Accuracy = 1 (completely accurate) – O(# items2) • Completely Rating-Based – Accuracy ≈ 0.8 (accurate) – O(# items) Placing it all together: Sorting 51
  • 52. • First, gather a bunch of ratings • Order based on average ratings • Then, use comparisons, in one of three flavors: – Random: pick S items, compare – Confidence-based: pick most confusing “window”, compare that first, repeat – Sliding-window: for all windows, compare  the best Placing it all together: Sorting 52
  • 53. 0.8 0.85 0.9 0.95 1 0 20 40 60 80 Accuracy # Tasks Compare Rate 53
  • 54. 0.8 0.85 0.9 0.95 1 0 20 40 60 80 Accuracy # Tasks Hybrid Compare Rate 54
  • 55. Crowdsourced Data Processing Marketplace #1 Marketplace #2 Marketplace #n … … • Interfaces • Incentives •Trust, reputation • Spam • Pricing Plumbing Algorithms Systems Basic ops • Compare • Filter Complex ops • Sort • Cluster • Clean • Get data •Verify 55
  • 56. Data Processing Systems Declarative Crowdsourcing Systems: Qurk (MIT), Deco (Stanford/UCSC), CrowdDB (Berkeley) Treat crowds as just another “access method” for the database • Fetch data from disk, the web, … , the crowd • Not just process data, but also gather data. 56
  • 57. ThereAre Other Systems… • Toolkits – Turkit, Automan – Crowds = “API calls” – Little to no optimization • Imperative Systems – Jabberwocky, CrowdForge – Crowds = “Data Processing Units” – Programmer dictated flow, limited optimization within the units • Declarative Systems – Deco, Qurk, CrowdDB – Crowds = “Data Processing Units” – Programmer specifies goal, optimized across the spectrum 57 Analogous to ProgrammingAPIs Analogous to Pig or Map-Reduce Analogous to Relational Databases IncreasingDeclarativity
  • 58. Why is DeclarativeGood? • Take away repeatable code and redundancy • Lack of manual optimization • Less cumbersome to specify 58
  • 59. What does one need? (SimpleVersion) 1. A Mechanism to “Store”/”Represent” Data 2. A Mechanism to “Get” More Data 3. A Mechanism to “Fix” Existing Data 4. A “Query” Language Two prototypical systems: Deco: an end-to-end redesign Qurk: a small modification to existing databases 59
  • 60. name capitalname capital Peru Lima France Nice France Paris France Paris 60 name name language name language capital name Peru France fetch rule φname name language Peru Spanish Peru Spanish France French fetch rule name language fetch rule namecapital fetch rule languagename name Peru France name language Peru Spanish France French name capital Peru Lima France Paris resolution rule name language capital Peru Spanish Lima France French Paris ⋈ o resolution rule User view Raw Tables A D1 D2
  • 61. Deco: Declarative Crowdsourcing DBMS 1) Representation scheme: Countries(name,lang,capital) 2) “Get” more data: fetch rules name  capital capital,lang  name 3) “Fix” data: resolution rules lang: dedup() capital: majority() 4) Declarative queries select name from Countries where language = ‘Spanish’ atleast 5 User or Application 61
  • 62. Qurk • A regular old database • Human processing/gathering as UDFs – User-defined functions – Commonly also used by relational databases to capture operations outside relational algebra – Typically external API calls 62
  • 63. Qurk filter: inappropriate content photos(id PRIMARY KEY, picture IMAGE) Query =SELECT * FROM photos WHERE isSmiling(photos.picture); UDF 1) Representation scheme: UDFs are “pre-declared” 2) “Get” more data: UDFs translate into one/more fixed task types 3) “Fix” data: UDF internally handle quality assurance 4) “Query” SQL + UDF 63
  • 64. ATutorial inThree Parts Part 0: A (Super Short) Survey of Part 1 and 2, plus Background (Me) Part 0.1: Background + Survey of Part 1 Part 0.2: Survey of Part 2 Part 1: A Survey of Crowd-Powered Data Management in Academia (Me) Part 2: A Survey of Crowd-Powered Data Management in Industry (Adam)  UP NEXT 64
  • 65. A little about me Assistant prof at Illinois since 2014 Thesis work on crowdsourced data processing Now work on Human-in-the-loop data analytics (HILDA) Twitter: @adityagp Homepage: http://data-people.cs.illinois.edu 65 Understand Visualize Manipulate Collaborate http://populace-org.github.io http://orpheus-db.github.io http://zenvisage.github.io http://dataspread.github.io
  • 66. 66