This document discusses the complexities of developing practical web automation tools based on observations from user studies. It identifies challenges such as dealing with uncertainty due to the lack of standards in web technologies, maximizing user trust by ensuring user control and recoverability, and minimizing cognitive load. An example automation assistant is described that guides users step-by-step through tasks by suggesting actions and automating one action at a time with user confirmation. The conclusion notes some successes but that full automation has not been achieved due to uncertainties from lack of standardization, and that building trust and usability remain key challenges.
1. Yury Puzis, Yevgen Borodin, I.V. Ramakrishnan
Complexities of Practical
Web Automation
Stony Brook University
2015
NSF Grant No. IIS-1218570
2. Contents
Goal: help design practical web automation tools by sharing
observational experience
❖ Human-Computer Interaction Perspective
❖ Technical Perspective
❖ Example: Automation Assistant
❖ Conclusion
3. Why Web Automation?
❖ Problem: non-visual browsing is hard
❖ It is hard (or impossible) to find relevant information
and easy to become overwhelmed by what is irrelevant
❖ There are many shortcuts (gestures) to learn and hard
(or impossible) to accomplish non-trivial tasks
❖ Web automation has the potential to enable visually
impaired users to breeze through Web browsing tasks that
beforehand were slow, hard, or even impossible to achieve
4. Observation
User Environment Web Automation Tool
Browsing Actions Events
User Environment Web Automation Tool
Events Automation Instructions
Automation
5. Maximizing Trust
❖ Gaining and maintaining user trust is the cornerstone of
web automation: even a few disasters is a big problem
❖ The user needs to know and influence what will happen
(review, parameterize, choose) and what has happened
(review, revert, recover) at all times
❖ Failure is inevitable and has to be graceful: terminate
automation, ignore failed action, take corrective action,
or suggest the user to take corrective action
6. Minimizing End-To-End Cost
❖ Cognitive load and operation time must be end-to-end
lower when using automation than otherwise
❖ Web automation costs: managing creation, execution and
consequences of automation; context switching
❖ Screen-Reader and browser costs: many and are well known,
including the need to plan complex sequences actions by
memory or execute exhaustive search and guess, guess,
guess
❖ In conflict with the need to maximize trust
7. Dealing with Uncertainty
Goal: automate user intent without resorting to handcrafting
scripts (programming), interpret environment reaction
Problem: we can only guess
❖ Semantics of user browsing actions
❖ Semantics of environment events
❖ Semantics of webpage elements
8. Making Observations
❖ Goal: make meaningful observations from events
❖ Problem: browsing actions can trigger multiple
(including cascading) events, and there are different
types of events: e.g., shortcut press -> JavaScript call ->
DOM mutation -> virtual cursor movement
❖ Problem: over time, an event may change its semantics
(same event - different results) or implementation
(different event - same results)
9. Addressing Webpage Elements
❖ Goal: identify target webpage element
❖ Problem: most addressing approaches are designed to
query DOM for elements at the specified address, but we
need to query DOM for address of the specified element
❖ Solutions: sloppy programming, machine learning, etc.
but no unbreakable approaches exist
10. Detecting Action Completion
❖ Goal: wait for action to complete (succeed or fail) before
continuing to interact with the user & the environment
❖ Problem: no standard way to specify action completion;
cascading, asynchronous and scheduled JavaScript
events make things harder
❖ Solutions: listen to all relevant JavaScript events
through callback functions; timeout; wait for predefined
DOM mutations / value changes (success or failure)
11. Example: Automation Assistant
❖ Observes everything the user is doing (no macros)
❖ Guides the user through browsing tasks step-by-step
❖ suggests several alternative browsing actions based on
user’s prior actions
❖ automates only one action at a time
❖ each set of suggestions is explicitly requested, each
action is explicitly chosen, each outcome is reviewed
❖ No context switch between automation and screen-reading
Puzis Y., Borodin Y., Puzis R., Ramakrishnan I. V.,
Predictive Web Automation Assistant for people with vision impairments. WWW '13.
12. Conclusion
❖ There are some successes but automation is not there yet
❖ The biggest technical challenge is uncertainty which stems
from lack standardization
❖ The biggest HCI challenges are building trust and keeping
things “cheap”
❖ The HCI aspect of this talk is, to a large extent, applicable
to all automation tools, not just web automation. It is also
applicable to all users not just the visually impaired users
(think handheld, wrist devices)