Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Human factor in big data qrowd bdve
1. The human factor in big data
BDVe webinar series
November 6th 2018
Elena Simperl, University of Southampton, UK
@esimperl
2. Volume
Veracity
Velocity
Variety
Big data
• Data value chains as driver for growth and change
• Transformative impact leading to new infrastructure,
businesses, politics and social interactions
• Created, refined, valued and exchanged unlike any other
resources
• Alters the rules for markets and demands new
approaches from regulators
The data economy
3. Example: Disrupting transport
Smart cities have access to more data than ever to
inform policy and service design
Driverless cars, electrification and connectivity are
transforming the automotive industry
Machine learning and AI can help optimise traffic,
support future planning and improve fuel efficiencies
4. Challenges
Data
availability
• Collecting missing data
• Labelling data to train and
validate algorithms
• Improving data quality
• Integrating across sources
Data use
• Making decisions inclusively
• Enabling the free flow of data
• Innovating responsibly
Many of these
tasks are
automated, but
technology has
limitations
Legal, economic,
social, ethical
implications
5.
6. More and better data
Training and validating algorithms
Engaging and empowering citizens,
customers etc.
The human factor in big data
9. Organisations struggle to leverage
the human factor
What form of
crowdsourcing
to choose?
How to engage
with the
crowd?
Why would
the crowd
care?
How do we
control the
quality?
Does it need
to be in real-
time?
Can we afford
it at scale?
10. Qrowd
Innovation action, part of the Big Data Value PPP
Started in December 2016, 3 years, 3.9M €
8 partners from 5 European countries, coordinated by the
University of Southampton
Smart city solutions
Combining crowd and computational intelligence
Piloted in transportation with
A medium-sized smart city
A leading navigation and traffic management service provider
11. Enabling data value chains
Standards compliant,
interoperable, open, no
vendor lock-in
Leverages existing
technology stacks
Used by industry partners
Extendable and scalable to
adapt to new urban
contexts
Platform for data and
process (data flow)
integration
12. The human factor in Qrowd
Mix of open innovation methods to co-design pilots and encourage
stakeholder participation
Value-centric approach to platform design: personal data empowerment,
open source, building upon existing standards
Sustainable urban auditing through online and mobile crowdsourcing
Human-in-the-loop (HIL) architecture to improve the accuracy of
predictions
13. More than just technology
Supports deployment of
human-machine workflows
throughout
Interfaces to multiple
crowdsourcing services
Complemented by
methodology and
guidelines
Data protection by design
14. The ‘what, who, how, why’ methodology
14
What
• Tasks you can’t complete in-house or using computers
• A question of time, budget, resources, ethics etc.
Who
• Crowdsourcing ≠‘turkers’
• Open call, biased via choice of platforms and promotion
channels
• No traditional means to manage and incentivize
• Crowd has often little to no context about the project
How
• Macro vs. microtasks
• Complex workflows
• Assessment and aggregation
• Timeliness of results
Why
• Different crowds with different motivations
• Incentives influence motivations
• Aligning incentives
15. Using the methodology
Who is it for
• Organisations interested in increasing participation via crowdsourcing
• Technology providers implementing HIL architectures
How can it be used
• Provides a process model starting with the What, followed by the Who,
which then determine the How. Every What/Who/How decision impacts
on the Why
• Can be used with or without the Qrowd platform
• Helps specify goals and decide what forms of crowdsourcing to use
• Helps roll out crowdsourcing projects and use their results effectively
• Helps understand motivations and incentives and their role in successful
projects
16. Examples
Urban auditing: Collect up to date
information about parking spaces in a city
Modal split: Collecting training data to
predict the use of different means of
transport
17. What
In general
• Something you cannot do using traditional means or that
requires broader engagement
• Something you cannot do (fully) automatically – a data
collection or analysis task
In our examples
• Parking: We need a dataset with all parking spaces in a city
(alternatively: parking availability). Traditional surveys too costly.
• Modal split: We need trips involving different means of
transport and labels for each trip segment. This data is not
available and is needed to train AIs.
8/1/2019 17
What Who
How Why
18. What task am I trying to
solve?
Can I solve it via other
means: buy the data,
label in house, use
less/noisier data etc.
19. Who
In general
• An open (‘unknown’) crowd
• Scale helps solve problem faster
• Some tasks will have time, location or skills constraints (hence,
smaller crowd, hence slower or costlier)
In our examples
• Parking
• People who are familiar with an urban area e.g., Open Street Map community, citizens
• Drivers using a SatNav
• Paid crowd workers
• Social media users
• Modal split
• Commuters, tourists, people using transport
8/1/2019 19
What Who
How Why
20. Who is my crowd?
How do I recruit
participants?
What are my
requirements?
Can I find volunteers?
Shall I use a crowdsourcing
platform?
21. How: Process
In general
• Many ways to implement tasks: specialized platforms, social media, extension of
existing system etc.
• Tasks broken down into smaller units, undertaken in parallel by different people
• Does not apply to all forms of crowdsourcing – sometimes the breakdown is part of the
solution!
• Does not apply to creative tasks, underexplored problem spaces etc.
• Task assignment to match skills, preferences, and contribution history
• Example: random assignment vs meritocracy vs full autonomy
• Explicit vs. implicit participation
• Affects motivation
• Partial or independent answers consolidated and aggregated into complete
solution
• Example: challenges (e.g., Netflix) vs aggregation (e.g., Wikipedia)
• Real-time answers
• Require alternative models and incentives
8/1/2019 21
What Who
How Why
22. How: Process
In our example - parking
1. Crowdsourcing platform: Virtual City Explorer tool using virtual
street imagery. Participants are paid.
2. Extension of existing system: SatNav prompting user to answer
questions about parking availability. Contributions could be
incentivised.
3. Data collection app: i-Log app launches challenges to collect
parking pictures in a city. Best pictures receive a prize.
8/1/2019 22
What Who
How Why
23. Virtual City Explorer
• Crowdsourcing platform for
urban auditing, developed at the
University of Southampton
• People explore a virtual city via
street imagery
• They solve small tasks against
micropayments
• VCE validates answers,
consolidates data and analyses
user behaviour to propose
optimisations
24. i-Log and QrowdLab
i-Log is an Android application developed at the University of
Trento used for people-centric sensing
QrowdLab is a citizen innovation lab set up in Trento to
engage with citizens on city matters
We need tools to connect with the citizens
We need data to understand patterns of
behaviour and collect missing data
We need feedback on how people interact with
the city and its infrastructure
25. How: Process
In our example – modal split
• Combination of machine learning classifier, citizen sensing and
labelled data collected via gamified challenges
8/1/2019 25
What Who
How Why
26. Where do I deploy
crowdsourcing? Do I need a
new system?
How do I allocate tasks to
people? Or do I let them
choose freely how to
contribute?
How do I deal with low quality
solutions? Can I recognise
good solutions easily?
27. Why: money, love or glory
Love and glory reduce costs
Money and glory make the
crowd move faster
27
Intrinsic vs extrinsic motivation
• Rewards/incentives influence motivation
Successful unpaid crowdsourcing is difficult to
predict or replicate
• Highly context-specific
• Not applicable to arbitrary tasks
Reward models often easier to study and
control (if performance can be reliably
measured)
• Not always easy to abstract from social
aspects (free-riding, social pressure)
• May undermine intrinsic motivation
What Who
How Why
28. Why
In our examples
Who benefits from the results?
Who owns the results?
How much effort does it require from the crowd?
Money
Different models: pay-per-time, pay-per-unit, winner-
takes-it-all
Define the rewards, analyse trade-offs accuracy vs.
costs, avoid spam
Love
OpenStreetMap, games, citizen panels
Glory
Competitions, awards
29. Why would anyone care to
contribute?
Is the task intrinsically
rewarding?
What would motivate people
to participate?
How do I sustain participation?
30. Leveraging the human factor
The most sophisticated AI systems showcase ingenious
combinations of human and machine intelligence
Crowdsourcing can augment any aspect of the data value
chain
Our methodology can help organisations understand how
to use crowdsourcing effectively
Qrowd develops a platform with integrated crowdsourcing
support to deploy hybrid data collection and analysis
workflows
31. Further reading
• Qrowd project: qrowd-project.eu, @QrowdProject
• Figure Eight: figure-eight.com
• How to use crowdsourcing effectively, Simperl, E. (2015):
https://www.liberquarterly.eu/articles/10.18352/lq.9948/
• When computers were human, David Alan Grier, 2007
• The collective intelligence genome, Malone, T. W., Laubacher, R., &
Dellarocas, C. (2010). MIT Sloan Management Review, 51(3), 21.
• Getting Results from Crowds: The Definitive Guide to Using
Crowdsourcing to Grow, Dawson, R. and Bynghall, S. (2011).
Advanced Human Technologies