"Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent
Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham.
In Proceedings of Conference on Human Computation & Crowdsourcing (HCOMP 2016), 2016, Austin, TX, USA.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
"Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent
1. s“ Is there anything else I can help you with? ”
Challenges in Deploying an
On-Demand Crowd-Powered
Conversational Agent
Ting-Hao K. Huang
Walter S. Lasecki
Amos Azaria
Jeffrey P. Bigham
1 / 31
2. Challenges of Open Conversation
• Goal: A system that users can converse with
• General Purpose Dialog System
– Combining multiple dialog systems
• DialPort (Zhao, et al., 2016)
– Adapting a model to many other domains
• Walker, et al., 2007; Sun, et al., 2016
– Chit-chat system
• Hold social conversations (Banchs, et al., 2012)
• It is still a very hard problem…
– Alexa Prize: $2.5 Million
• “… achieves the grand challenge of conversing coherently and engagingly
with humans on popular topics for 20 minutes.”
2 / 31
3. What birthday gift
should I get for Laila?
Sorry I can not understand
your question.
Kenneth’s apartment.
3 / 31
4. • Crowd workers collectively hold a
conversation by:
1. Propose Responses
2. Vote Responses
3. Take Notes
Chorus: A Crowd-powered
Conversation Assistant
Lasecki, W. S.; Wesley, R.; Nichols, J.; Kulkarni, A.; Allen, J. F.; and Bigham, J. P. 2013.
Chorus: A crowd-powered conversational assistant. In UIST 2013, UIST ’13, 151–162.
4 / 31
6. Research Questions
• How hard it is to deploy such a system?
– Real-time crowdsourcing +
– Conversational interface +
– Intelligent agent
• How will users use it?
• Will workers be capable to handle all the tasks?
6 / 31
7. We deployed Chorus
• Launched on May 20th, 2016.
• 113 users used it during 937 conversational sessions
7 / 31
8. How to recruit workers
fast on-demand?
• Two Common Practices
– Start recruiting on-demand (Bigham, et al., 2010)
• Pros: Workers are engaged when waiting
• Cons: Expensive to have workers wait longer
– Keep workers on-call (Retainer) (Bernstein, et al., 2011)
• Pros: Quick response
• Cons: A retainer runs on money / “Cold start”
• Both are designed for short tasks
8 / 31
11. Is this recruiting method
fast enough?
• Avg first crowd response Time = 88.351 sec
21.55% first crowd respond within 30 sec
56.08% fist crowd respond within 1 min
81.77% crowd respond within 2 min
90.06% crowd respond within 3 min
11 / 31
14. come on......
This is a YouTube link...
Not how to backup my
MySQL database
but it’s funny
what up b****h
U
U
Inappropriate
Workers
Try that
[The YouTube link of “Bryan
Cranston’s Super Sweet 60”
of “Jimmy Kimmel Live”]
U[Ask How to backup a
MySQL database]
14 / 31
15. You mean username?
we need to verify your
name
U
Flirter
Workers whats your name user?
what ?
Or my name?
real name
both
…
15 / 31
16. Spammer Workers
• We know they exist
• 3 Main Actions
– Message
• “how are you”, “yeah”, “yes (or no)”, “Sure you can”,
or “It suits you best.”
– Note
• “user is dumb”
• “like all the answers.”
– Vote
• Upvote on almost all messages
16 / 31
17. How does Chorus detect
abusive language?
• Word Matching
17 / 31
18. Malicious Users
• Abusive Languages
– Sexual content
– Profanity
– Hate speech
– Threats of criminal acts
• Solutions
– Word detection
18 / 31
20. Can I have pumpkin
congee? The cold ones
Maybe not now..
Why keep asking?
U
U
“Is there
anything else
I can help
you with?”
That should be fine
Is there anything else I
can help you with?
That would be great
actually. :)
Is there anything else?
…
U
[Ask what can he/she eat or
drink after a dental surgery]
20 / 31
21. Nope U
U
The Dynamics
of User Intent
Are you sure?
Any other question?
to confirm exit please
type EXIT
or if you want funny cat
jokes type CATS
CATS
…
21 / 31
22. I see... so I will need to
check the traffic at
different times of the day
Did you try Google
traffic alerts?
Are you there?
U
User
Timeout
Please wait for a
few minutes...
UIs there an easy way to
check traffic status between
Miami and Key West?
[User didn’t respond for 2 minutes]
22 / 31
23. How does Chorus know
a conversation is over?
• User & Crowd Timeout
• Crowd Voting
– Once 2 workers click “Conversation is over”
23 / 31
25. I was wondering about
your name. Why is it
Chorus Bot?
How long has it been
for you here?
Is there anything I can help
you with?
About 3 minutes
U
U
Collective
Identity &
Personality
I am not sure.
I’m new to this.
25 / 31
26. This worker’s opinion is
that God does not exist.
I believe in a God, but
not necessarily all of
the things in the Bible
Subjective
Questions
Evolution can’t be
disproven, but neither can
creationism in a sense.
Is that all?
UDo you believe Bible
is God’s word?
[ Few messages later ]
26 / 31
27. Chorus Bot can’t
reserve tables :( ?
U
Requests
For
Action
I can reserve a table
for you if you prefer
[Suggested the user to call
a restaurant’s number to
make a reservation.]
what time and how
many people?
27 / 31
28. To conclude…
1. Malicious Workers & Users
– Content
2. Identifying the End of a Conversation
– Boundary
3. When Consensus Is Not Enough
– Scope
28 / 31
29. What’s next?
• Why Chorus can talk?
– Decompose human workers’ tasks into sub-tasks
– Which sub-tasks can be automated?
• What can we learn from the data?
– User’s questions
– Crowd Response
– Votes
29 / 31
32. Reference
• Zhao, T., Lee, K., & Eskenazi, M. (2016). DialPort: Connecting the Spoken
Dialog Research Community to Real User Data. arXiv preprint
arXiv:1606.02562..
• Banchs, R. E., & Li, H. (2012, July). IRIS: a chat-oriented dialogue system
based on the vector space model. In Proceedings of the ACL 2012 System
Demonstrations (pp. 37-42). Association for Computational Linguistics.
• Walker, M. A., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and
domain adaptation in sentence planning for dialogue. Journal of Artificial
Intelligence Research, 30, 413-456.
• Sun, M. (2016). Adapting Spoken Dialog Systems Towards Domains and Users
(Doctoral dissertation, YAHOO! Research).
• Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., ... & Yeh, T.
(2010, October). VizWiz: nearly real-time answers to visual questions. In
Proceedings of the 23nd annual ACM symposium on User interface software
and technology (pp. 333-342). ACM.
• Bernstein, M. S., Brandt, J., Miller, R. C., & Karger, D. R. (2011, October).
Crowds in two seconds: Enabling realtime crowd-powered interfaces. In
Proceedings of the 24th annual ACM symposium on User interface software and
technology (pp. 33-42). ACM.