1. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Transformational Impact
of Cloud Labor
John Hoskins & Daniel Gray
jhoskins@amazon.com
djgray@amazon.com
2. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Crowdsourcing Best Practices
amazon
web services
3. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Crowdsourcing myths
4. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• It’s cheaper
– It’s actually more efficient
• It’s faster
– It’s actually more scalable
• It’s not accurate
– It’s actually more accurate
The Myths[ ]
5. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
5
Crowdsourcing Best Practices
6. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Consider the question carefully
– Workers answer what you ask
• Select your workers
– Perspective and skills vary
• Iterate and Optimize
– Adjust for optimal results
7. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
The question[ ]
7
You will get an answer to the question that
you ask. Focus on asking the right
question
8. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Choosing your workers[ ]
8
Workers are different– from language and
cultural differences – to varying skills. Test
and monitor.
9. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Monitor and Improve[ ]
9
Monitor key metrics, adjust and measure
key attributes impact on those metrics.
10. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
• Accuracy
– Know your current
• Throughput
– Understand both turnaround and scale requirements
• Cost
– Measure against a budget – as cost can impact the other two
Key Metrics[ ]
10
“Great service, Good food, Friendly staff – you can choose two”
11. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Cost[ ]
11
Cost is impacted most by the efficiency of
the other two metrics.
12. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Accuracy[ ]
12
Error has two sources: human and
systematic. Isolating human error and
solving for systematic error gives a
better chance for long term success.
13. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Accuracy[ ]
13
After solving for systematic error
choosing the best workers and
monitoring those workers provides
the next step towards high accuracy
and lowering costs.
14. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Throughput[ ]
14
Many factors impact throughput;
Reputation
Ergonomics
Clarity
15. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
http://www.mturk.com
15
John Hoskins, Amazon Mechanical Turk
hoskins@amazon.com
amazon
web services
16. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Cost[ ]
16
Cost is impacted most by the efficiency of
the other two metrics.
Optimization of task and workers lowers
both the cost of getting it done and
adjudicating a result.
17. AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
AWS Government, Education, and Nonprofits Symposium
Washington, DC | June 24, 2014 - June 26, 2014
Thank You
Notes de l'éditeur
Welcome to the Crowdsourcing Best Practices portion of today’s workshop.
First I want to squelch some bias’ that seem to present themselves to enterprises new to crowdsourcing.
Everyone envisions third world workers doing tasks for 10% of normal costs – that’s not necessarily true. Task work will cost the same to do with a crowd as it will with other options. Where the savings come in is in the efficiency of the process – 100% utilization of human capital, no overhead, no fixed fees. These add up to large overall savings – don’t focus on getting the task done cheaply, focus on the process costs – that’s where the savings present themselves.
It’s faster – yes, but it’s faster because it’s scalable to meet demand. Work is done in parallel – and at scalable levels. Creating an environment where large volumes of task can get done in shorter time due to the immediate availability of workers scaling to need
Finally – prospects always say it can’t possibly be as accurate as in house experts – but experience shows when implemented with automation and best practices – it’s actually more accurate. Most internal workflows are measured by sampling, that doesn’t uncover the outliers, the exceptions, and is subject to sampling error. In fact many customers don’t really know the true accuracy of their current workflow . Automated crowdsourcing workflows provide a confidence score on every answer – giving you the metrics you need to measure and improve accuracy to maximum levels.
When people ask me how to think about crowdsourcing their workflows – or what should change in their thinking – I always come back to these three things.
Consider the question – think about how you are disintegrating your work.
Select the best workers for your work
And constantly improve
So let’s talk about the question. At Amazon it is our belief that the better you disintegrate the steps in your workflow – the closer you get to a binary question – one right answer – the easier it is to crowdsource. The more the question requires context or interpretation – the more possibility you’ve created for error. Asking the right question – or series of questions is the foundation to a successful crowdsourced implementation. Sometimes what you think is one question – might actually be more than one.
Consider cultural context – is it important to your task, or can you define your task well enough to eliminate it. Also, don’t think in terms of skills like programming, or accounting – think in terms of skills like recognition for transcribing poor handwriting, or expressiveness for keywording, [tell story of me transcribing audio]. Establish the task type, so that workers can self select. Workers don’t like to be wrong – they’ll avoid tasks they aren’t good at. Then, from the pool of workers choosing your tasks – find the better ones.
Finally – establish results goals – key metrics, measure and iterate to improve.
What are the common key metrics – you might have additional ones – or different priorities, but these are common across our customers. And they are interrelated
Accuracy – what are you getting today, and what do you need. Accuracy comes at a cost – so be realistic [story about customers often not knowing their true accuracy]
Throughput – what are the process requirements – and what opportunity does improvement provide. Often the new found speed of retrieving information opens the door to process improvements not considered in the base ROI. [tell CPG story]
Cost – think of cost differently as it impacts the other two, more judgments arrive at greater confidence, at greater overall task cost – higher rewards attract more workers, improve throughput – etc. Remember, savings come in the efficiencies – in some cases we’ve seen where the task cost was actually higher than internal sources – but the efficiencies and speed provided significant business impact, negating the extra spent on tasks.
I put cost third intentionally. While overall it is a key metric in almost all cases – it has many facets – here I’m simply focusing on thiings that you can do to establish the reward you pay to its optimal amount.
Task ergonomics play a huge role in worker efficiency. That impacts throughput as mentioned – but cost as well. Scrolling large windows, load times for data elements like videos and pictures, all of these cause the workers to take extra steps or pause – costing time – and to them, their time is money.
Finally there’s the sociological aspect of the task – overall workers like knowing the purpose – what you’re trying to accomplish. That helps them understand how to answer, workers are also attracted to fun tasks like reading tweets, looking at photos. I’m not saying only do fun tasks – I’m saying consider the boredom factor in pricing your tasks, typical database cleansing in the marketplace pays a little better than photo moderation due to the bredom factor.
Although it can all be attributed to humans making mistakes – isolating and correcting for the cause of the most common errors builds greater overall accuracy. Mistakes come in two forms – humans just making an error – human error, and what I’ve termed as systematic error (commonly called ????). Systematic error is typically caused by things like poor instructions, ambiguous data, unclear questions. By establishing a good sample workforce, Like Mechanical Turk Masters, you can begin to test and improve for systematic error. Look for outliers – large levels of disagreement, root cause the specific tasks to see if improvement can eliminate.
After solving for systematic error and having a clear picture of what to expect – you can now begin measuring your workers to see if some are better than others. Look for accuracy on known answers – using the known answer API, high levels of agreement with other workers with high gold standard scores – use that data to build a confidence score on each answer – establishing a key system metric to monitor.
Response times are impacted by many factors. Initially, you’re as new to the workers as they are to you. How you establish that brand can impact your long term throughput. Workers are looking for Requesters with clearly defined tasks that they know they can do accurately – that adjudicate fair and pay quickly. Think in terms of worker efficiencies – you are paying workers ultimately for their time – and doing things that allow them to be more efficient saves them time. Like something as simple as prepopulating a web search you want done. Finally, clarity of task impacts throughput – helping workers understand how you want the question answered – how to handle edge cases. All of this gives the worker greater confidence to answer the question correctly and avoid mistakes – thereby improving their desire to do the tasks.
I put cost third intentionally. While overall it is a key metric in almost all cases – it has many facets – here I’m simply focusing on thiings that you can do to establish the reward you pay to its optimal amount.
Task ergonomics play a huge role in worker efficiency. That impacts throughput as mentioned – but cost as well. Scrolling large windows, load times for data elements like videos and pictures, all of these cause the workers to take extra steps or pause – costing time – and to them, their time is money.
Finally there’s the sociological aspect of the task – overall workers like knowing the purpose – what you’re trying to accomplish. That helps them understand how to answer, workers are also attracted to fun tasks like reading tweets, looking at photos. I’m not saying only do fun tasks – I’m saying consider the boredom factor in pricing your tasks, typical database cleansing in the marketplace pays a little better than photo moderation due to the bredom factor.