Skovsgaard Small Target Selection With Gaze Alone

Small-Target Selection with Gaze Alone
Henrik Skovsgaard∗ Julio C. Mateo† John Paulin Hansen§
IT University of Copenhagen John M. Flach‡ IT University of Copenhagen
Wright State University

Abstract sets), negatively impacts gaze-pointing accuracy. To reliably iden-
tify fixations and saccades, gaze-tracking systems use algorithms
Accessing the smallest targets in mainstream interfaces using gaze based on velocity, dispersion, or a combination of both [Duchowski
alone is difficult, but interface tools that effectively increase the 2007]. For example, a velocity threshold can be set such that gaze
size of selectable objects can help. In this paper, we propose a velocities faster than this threshold are considered part of a saccade
conceptual framework to organize existing tools and guide the de- whereas slower velocities are considered part of a fixation.
velopment of new tools. We designed a discrete zoom tool and
conducted a proof-of-concept experiment to test the potential of the Gaze-tracking systems use detected fixations and saccades to break
framework and the tool. Our tool was as fast as and more accu- gaze movements into pointing and selection components. If a sac-
rate than the currently available two-step magnification tool. Our cade is detected, it is assumed to belong to the pointing component.
framework shows potential to guide the design, development, and However, fixations can occur both during pointing and during se-
testing of zoom tools to facilitate the accessibility of mainstream lection. That is, users may look at an object because they want to
interfaces for gaze users. inspect it further (i.e., inspection fixations) or because they want to
select it (i.e., selection fixations). The most common method to dis-
CR Categories: H.5.1 [INFORMATION INTERFACES tinguish inspection and selection fixations is to set a time threshold
AND PRESENTATION]: Multimedia Information Systems— (i.e., dwell time). That is, fixations lasting longer than dwell time
Evaluation/methodology are considered part of the selection component whereas shorter fix-
ations are considered part of the pointing component. In general, a
selection fixation results in an activation at the cursor location and,
Keywords: gaze interaction, universal access, zoom interfaces
if the cursor is on top of a target, a target selection.

1 Introduction Approaches to address the limited accuracy of gaze pointing in or-
der to enhance the accessibility to mainstream GUIs can be grouped
Mainstream graphical user interfaces (GUIs) are generally designed into two categories. Some approaches aim at reducing the noise in
with the mouse user in mind. As a consequence, users who rely on the input (gaze) signal, whereas others aim at increasing the toler-
alternative input devices may encounter difficulties when accessing ance of interfaces to noisy inputs. These two approaches are not
these GUIs. In this paper, we will focus on issues encountered by mutually exclusive and, in fact, usually complement each other.
users of gaze tracking systems when selecting the smallest targets
in mainstream GUIs. The limited accuracy of gaze pointing (when 1.1 Reducing Noise in the Input Signal
compared to mouse pointing) can make small-target selection very
difficult for gaze-input users. Before discussing ways to address the The most common way to reduce the noise in the gaze signal is to
limited accuracy of gaze input, we will briefly review how the gaze smooth (i.e., low-pass filter) the signal to increase the steadiness of
signal is processed and which factors affect gaze-pointing accuracy. the cursor. Most commercial gaze trackers smooth the input signal
Point-and-select operations, such as pointing at an icon and clicking before displaying the cursor. In fact, it is generally accepted that,
on it to open an application, are typical of mainstream GUIs. Mouse given the jitter inherent to eye movements, some degree of smooth-
users physically move the mouse to point and press the mouse but- ing is necessary to use gaze as an input signal. However, smoothing
ton to issue an activation (i.e., select). Pointing is straightforward also results in reduced responsiveness to gaze movements (i.e., time
for gaze-input users as well, but our eyes lack a selection mech- delay) and, therefore, there is a tradeoff between cursor steadiness
anism. To identify when a user wants to issue an activation, gaze and responsiveness. Actually, cursor smoothing effectively reduces
tracking systems divide eye movements into saccades and fixations. the frame rate of the system by averaging across gaze samples.

Saccades are fast movements that cover relatively large spatial re- Signal smoothing and fixation-detection algorithms are not inde-
gions when users move their gaze from one location of interest to pendent from each other. On the one hand, the amount of smooth-
the next. Fixations are relatively slow movements performed in a ing applied to the gaze signal can impact the velocity threshold
limited spatial region when a user is inspecting an object of interest. used in the fixation-detection algorithm. That is, smoother signals
Even during fixations, the eyes are continuously moving. This in- need lower velocity thresholds than less smooth signals to reliably
herent eye jitter, combined with gaze tracker inaccuracies (e.g., off- distinguish between fixations and saccades. On the other hand,
the output of fixation-detection algorithms can be used to inform
∗ e-mail: hhje@itu.dk when smoothing is applied. For example, cursor smoothing can be
† e-mail:mateo.2@wright.edu
stopped as soon as the algorithm detects a saccade and re-activated
‡ e-mail:john.flach@wright.edu
during fixations to increase cursor responsiveness.
§ e-mail: paulin@itu.dk
Copyright © 2010 by the Association for Computing Machinery, Inc. 1.2 Increasing Interface Tolerance to Noise
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for commercial advantage and that copies bear this notice and the full citation on the An alternative approach to dealing with noisy inputs is to design
first page. Copyrights for components of this work owned by others than ACM must be GUIs that are tolerant to noise. For example, typing interfaces de-
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on veloped for gaze users display very large buttons (e.g., GazeTalk;
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
[Hansen et al. 2003]) or provide other interface features to avoid
permissions@acm.org. the need to select small targets (e.g., Dasher; [Ward et al. 2000]).
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

145

Start time - Dwell End time Figure 1: Illustration of
the different zoom tools.
Target of interest
Row 1 depicts a target
selection with dwell (i.e.,
1 no tool). Row 2 depicts
how the continuous zoom
Start time - Continuous Zoom End time tool gradually magnifies
the target area. Row 3
depicts how n-step tools
2 work. A two-step version
would end before enter-
ing the Additional Mag-
Start time - N-Step Zoom End time nification loop, a three-
step version would go
through the loop once,
3 and so on. The shrinking
red dots in row 1 and 3
indicate dwell time.
Additional Magnification (N > 2)

The use of dedicated software allows developers to have full access as the target increased in size. Third, we expected target selection
to the information underlying the environment in which the user is to be faster because the user would not need to perform two sepa-
acting (e.g., target locations). This information can be used to aid rate point-and-select operations. Fourth, we expected the maximum
small-target selection (e.g., force fields; [Zhang et al. 2008]). How- magnification level possible to be greater than using a two-step tool
ever, the development of dedicated GUIs for gaze users does not with a window of similar size because the entire region around the
address accessibility to mainstream GUIs. cursor did not need to be magnified all at once.
A way to increase the tolerance of mainstream GUIs to noise is to In our previous experiment, we found that this zoom tool facilitated
develop tools that interface with these GUIs to effectively increase small-target selection when compared to no tool [Skovsgaard et al.
the size of selectable objects. These tools are generally more lim- 2008], but it did not compare favorably to a two-step tool. Rather,
ited than dedicated GUIs due to their inability to access all informa- the two-step tool was more accurate and rated more favorably than
tion (e.g., target locations) underlying mainstream GUIs. The most the zoom tool. At least three factors might have contributed to the
common of these tools is two-step magnification [Lankford 2000], poor performance and ratings of the zoom tool. First, our zoom-
which is often available in commercial gaze trackers. This two- ing tool transformed a discrete point-and-select operation (with a
step tool divides the point-and-select task into two steps requiring a still target) into a continuous tracking task (with a moving target).
point-and-select operation each. During the first step, the detection Second, once zooming started, the user could not control the rate at
of a selection component does not result in an activation. Rather, a which content zoomed in. Third, the impact of the time delay result-
magnified (usually 2, 3, or 4x) version of the area surrounding the ing from processing and smoothing the gaze signal was amplified
cursor pops up. During the second step, the detection of a selection due to the first two factors. As a result, users corrections often led to
component (on the magnified window) results in an activation. As- instability (i.e., increasing error, rather than reducing it). It is pos-
suming the target is within the magnified area, this tool effectively sible that performing a tracking task using gaze input would not be
increases target size and, therefore, increases the GUI tolerance to problematic without delay. However, some delay is inherent to all
noise. Although helpful for small-target selection, the two-step tool current gaze-tracking systems as a result of signal processing and
slows down interaction and may feel unnatural to the user. smoothing. Therefore, tools developed to access mainstream GUIs
must be tolerant to both noise and delay.
2 Unanticipated Limitations of Zoom Tools
3 Re-evaluating the Design of Zoom Tools
In an attempt to address the limitations of the two-step tool, we de-
veloped a zoom tool to access mainstream GUIs. This tool was in- In our first implementation, we did not anticipate how our con-
spired by previous work with dedicated interfaces (e.g., StarGazer; tinuous zoom tool would change the task or how delay would af-
[Hansen et al. 2008]), which showed that zooming could help with fect performance. Empirical results challenged our assumption that
noisy input. Bates and Istance [2002] had also proposed the use continuous interaction would always be more natural than discrete
of zooming interfaces to facilitate access to mainstream GUIs for interaction. Instead, continuous interaction seemed unnatural with
gaze-input users. However, their tool magnified the whole screen delayed feedback. In fact, the manual-control literature suggests
and was controlled manually. In contrast, our gaze-controlled tool that, in the presence of delays, users naturally adopt a move-and-
presented a smooth animation surrounding the cursor. When a wait strategy [Ferrell 1965]. That is, users transform the continuous
short fixation was detected, the content in this window gradually task into a series of discrete components. Ironically, our attempt to
increased in size (as if approaching the user) for the duration of make the task more natural backfired because, even though con-
a predetermined zoom time. After this time elapsed, an activation tinuous interaction may be more natural in real-world situations,
was issued on the cursor position (i.e., the center of this window). discrete interaction is more natural in the presence of time delays.
See row 2 of Figure 1 for an illustration.
We expected this zoom tool to have at least four advantages over 3.1 Discrete Zoom Tools
the two-step tool. First, we expected its continuous looming ap-
pearance to feel more natural to the user. Second, we expected the Based on the results of our first study, we designed a discrete zoom
user to be able to make online corrections to the cursor position tool, which is conceptually equivalent to an n-step tool, combining

146

2 (Discrete) (Continuous)

8
and 6 females). Novices had no previous experience with gaze in-
Steps teraction. We used an IG-30 eye tracker from Alea Technologies
2-Ste Disc Con in a desktop setting. Participants were instructed to use a gaze-
p Dwe rete tinu controlled cursor to point to the target present in the workspace as
ll Zo om ous
Zoo quickly and accurately as possible. Circular targets appeared one at
m
a time at 1 of 16 possible locations equidistant (300 pixels) from the
homing circle on the center. A trial started when a participant posi-
Figure 2: The zoom framework. tioned the gaze cursor on the homing circle and ended as soon as the
participant issued an activation using the corresponding method. A
successful target selection was not required. Each participant com-
features of two-step and zoom tools (see row 3 of Figure 1 for an il- pleted 16 blocks of 16 trials, resulting in a total of 256 activations
lustration). Because zooming occurs in discrete steps, we expected per participant. All independent variables were manipulated within
this tool to be more tolerant to delay than the continuous zoom tool. participants and fixed within blocks.
When compared to the two-step tool, we expected more steps to We manipulated zoom tool, target size, and smoothing. Zoom tool
permit greater magnification levels because, after the first step, the had 4 levels: dwell (no zoom), two-step tool, three-step tool, and
content can be magnified further without increasing window size. optimized three-step tool. The magnification level (4x) and dwell
Obviously, adding steps can also slow down performance. How- time (600 ms) of the two-step tool were chosen based on available
ever, given that early steps require lower accuracy than the two- versions of this tool. In fact, we purposefully chose a relatively
step tool, we expected discrete zoom to accommodate lower dwell high level of magnification and a relatively short dwell time. The
times. We also expected the discrete zoom tool to result in more three-step tool had the same magnification level and dwell time as
of a zooming sensation than two-step while providing users more the two-step tool, whereas the optimized three-step tool had twice
control over zooming rate than continuous zoom. the magnification (8x) and half the dwell time (300 ms). Achiev-
ing 8x magnification with a two-step tool is virtually impossible
3.2 The Zoom Framework with a magnified window of the size used in this experiment. The
2 levels of target size were 6- and 12-pixel diameters (to represent
Based on our experience developing and testing tools to facilitate some of the smallest targets in the environment). The 2 levels of
the selection of small targets using gaze alone, we created a concep- smoothing (no smoothing and 10-sample average) were applied to
tual framework to organize existing tools designed for small-target the raw eye-tracker data and velocity thresholds were adjusted ac-
selection (Figure 2). All the tools in this framework increase the cordingly. We measured hit rate, completion time, and subjective
effective size of targets (i.e., zoom) to facilitate small-target selec- ratings. Data were analyzed with a repeated measures ANOVA and
tion. This framework organizes tools in a discrete-to-continuous LSD correction in the post-hoc tests.
continuum. The two-step and continuous zoom tools can be placed,
respectively, on the discrete and continuous ends of this continuum. We expected the three-step tool to: (a) feel more natural, (b) be
The two-step tool suddenly increases target size to its maximum more resistant to noisy input, and (c) enable reliable selection of
magnification level, whereas continuous zoom increases target size smaller targets than the two-step tool. We did not expect discrete
in what could be considered an infinite number of infinitely small zoom to be faster than the two-step tool, but we did expect an op-
steps. Consistent with these two extremes, tools closer to the dis- timized three-step version to achieve similar speeds to the two-step
crete end of the spectrum tend to have less steps of longer duration, tool without sacrificing accuracy. This optimized version was ex-
whereas tools closer to the continuous end of the spectrum tend to pected to be able to accommodate lower dwell times and greater
have more steps of shorter duration. The theoretical shorter dura- magnification levels than current two-step tools.
tion per step of tools with more steps (i.e., more continuous) is the
Due to space limitations, we emphasize the results that are most
result of shorter dwell times when compared to tools with less steps
relevant to the zoom framework. All data analyses were conducted
(i.e., more discrete). Tools toward the continuous end of the spec-
on the data from novices. Experts were used for comparison pur-
trum tend to require the user to carry out a more tracking-like task,
poses. Target size, smoothing, and subjective-rating results will not
whereas tools toward the discrete end can be better characterized as
be described in detail. Suffice to say that target size affected hit rate
a series of point-and-select operations. In addition, tools towards
but not completion time, whereas smoothing affected completion
the continuous end of the spectrum tend to permit higher magnifi-
time but not hit rate. Hit rate was lower for smaller targets than for
cation levels because objects can increase in size within a window
larger targets, F(1, 4) = 19.90, p < 0.05. Smoothing over 10 sam-
of constant size. Therefore, more continuous tools are less limited
ples resulted in longer completion times than no smoothing, F(1,
by the size of the zooming window.
4) = 11.06, p < 0.05. We found no evidence suggesting that no
In general, discrete zoom tools fall in between these two extremes. smoothing had a greater impact on the two-step than on the three-
The specific three-step version we test below falls closer to the dis- step tool. Therefore, this experiment did not support the hypothesis
crete end (see Figure 2). Even if close to two-step, we argue that that a three-step tool is more resistant to noise than two-step. Pre-
this three-step tool can facilitate selection of very small targets and liminary analyses suggest that participants did not rate the three
naturalness of interaction when compared to two-step magnifica- zoom tools different from each other, but some differences were
tion. We also argue that this framework may facilitate comparisons apparent between dwell and all three tools (i.e., dwell was rated as
among tools. By studying how tools vary along the continuum, this faster but less accurate than zoom tools). We found no evidence of
framework could provide insights into useful tool features and sug- the three-step tool being perceived as more natural than the two-step
gest ways in which future designs can combine these features. tool.
Zoom tool had a significant effect on hit rate, F(3, 21) = 32.43, p
4 Discrete Zoom Tools: Proof of Concept < 0.05. Mean hit rate was lowest without zoom (M = 0.04, SD =
0.03). The hit rates of the two-step (M = 0.24, SD = 0.11) and three-
In order to study the potential of discrete zoom tools, we conducted step tools (M = 0.29, SD = 0.12) were not significantly different
an experiment to compare different zoom tools. Participants in- from each other, t(7) = 1.22, p > 0.05. The optimized three-step
cluded 2 male expert users (first two authors) and 8 novices (2 males tool (M = 0.48, SD = 0.14) had a higher hit rate than the three-step

147

1.0

termine whether this result is due to a lack of difference between
0.9

Novice
Expert
tools or to a lack of sensitivity of the measures we used. Finally,
0.8

even if mean values varied substantially, we found a similar pat-
0.7
tern of results across a wide range of expertise levels. This result
suggests that findings from novices may generalize to more experi-
Mean
Hit
Rate

0.6

0.5
enced users and novice-user data may be useful to evaluate interface
0.4
tools.
0.3

0.2
5 Summary and Conclusions
0.1

0.0
Selecting the smallest targets in mainstream GUIs using gaze alone
Dwell
Two-‐Step
Three-‐Step
Three-‐Step
Op:mized
is not easy. Although some tools exist, there is little theoretical
Zoom
Tool

guidance for the development of tools to facilitate accessibility to
mainstream GUIs for gaze users. Based on our previous work, we
Figure 3: Mean hit rates for the 8 novices and the 2 experts as a
proposed a conceptual framework to categorize existing tools and
function of zoom tool.
guide the development of new tools. As a proof of concept, we de-
signed a discrete zoom tool and generated hypotheses about how
4500
it would compare to other zoom tools based on this framework.
4000

Novice
Expert

We conducted an experiment in which the optimized three-step dis-
Mean
Comple+on
Time
(ms)

3500
crete zoom tool we proposed achieved better performance than a
two-step tool modeled after existing tools. Results suggest that our
3000

framework holds potential to guide the development of zoom tools
2500

to enhance accessibility to mainstream GUIs for gaze users.
2000

1500
References
1000

500

BATES , R., AND I STANCE , H. 2002. Zooming interfaces!: en-
hancing the performance of eye controlled pointing devices. In
0

Dwell
Two-‐Step
Three-‐Step
Three-‐Step
Op5mized

Proceedings of the fifth international ACM conference on Assis-
Zoom
Tool
tive technologies, ACM, Edinburgh, Scotland, 119–126.

Figure 4: Mean completion times for the 8 novices and the 2 ex- D UCHOWSKI , A. T. 2007. Eye tracking methodology. Springer.
perts as a function of zoom tool. F ERRELL , W. 1965. Remote manipulation with transmission delay.
IEEE Transactions on Human Factors in Electronics 6, 24–32.
H ANSEN , J. P., J OHANSEN , A. S., H ANSEN , D. W., I TOH , K.,
tool, t(7) = 4.57, p < 0.05. These results are consistent with our AND M ASHINO , S. 2003. Command without a click: Dwell
hypothesis that better accuracy can be achieved with a three-step time typing by mouse and gaze selections. In INTERACT 2003,
than with a two-step tool. Given the difference between three-step IOS Press, 121–128.
and optimized three-step, the accuracy advantage is probably due to
the latter’s greater magnification level. Mean hit rates across zoom H ANSEN , D. W., S KOVSGAARD , H. H. T., H ANSEN , J. P., AND
tools show a similar pattern for novices and experts (Figure 3). M LLENBACH , E. 2008. Noise tolerant selection by gaze-
controlled pan and zoom in 3D. In Proceedings of the 2008
Zoom tool also had a significant effect on completion time, F(3, 21) symposium on Eye tracking research & applications, ACM, Sa-
= 119.04, p < 0.05. Completion times were shortest without zoom vannah, Georgia, 205–212.
(M = 1581 ms, SD = 192 ms). The two-step (M = 3193 ms, SD =
441 ms) and optimized three-step tools (M = 3152 ms, SD = 375 L ANKFORD , C. 2000. Effective eye-gaze input into windows. In
ms) were not significantly different from each other, t(7) = 0.39, Proceedings of the 2000 symposium on Eye tracking research
p > 0.05. The three-step tool (M = 3905 ms, SD = 442 ms) took & applications, ACM, Palm Beach Gardens, Florida, United
longer than the two-step tool, t(7) = 5.35, p < 0.05. These results States, 23–27.
are consistent with our hypothesis that a three-step tool can achieve S KOVSGAARD , H., M ATEO , J., AND H ANSEN , J. P. 2008. How
speeds comparable to a relatively fast version of the two-step tool can tiny buttons be hit using gaze only? In COGAIN 2008,
(given shorter dwell time in the three-step tool). Again, the pattern COGAIN, Prague, Czech Republic, vol. 4, 38–42.
of results was very similar for novices and experts (Figure 4).
WARD , D. J., B LACKWELL , A. F., AND M AC K AY , D. J. C. 2000.
Overall, the results of this experiment are promising. We found sup- Dasher - a data entry interface using continuous gestures and
port for the possibility that discrete zoom tools can achieve similar language models. In Proceedings of the 13th annual ACM sym-
speeds and greater accuracy than available two-step tools. Future posium on User interface software and technology, ACM, San
research should explore whether this finding generalizes to situa- Diego, California, United States, 129–137.
tions in which distractors are present and to tasks in which success-
ful target selection is required. Future studies should also explore Z HANG , X., R EN , X., AND Z HA , H. 2008. Improving eye cursor’s
whether a two-step tool could accommodate lower dwell times and stability for eye pointing tasks. In Proceeding of the twenty-
whether having different dwell times for different steps could be sixth annual SIGCHI conference on Human factors in computing
beneficial. Our smoothing manipulation and subjective ratings did systems, ACM, Florence, Italy, 525–534.
not support our hypothesis that three-step tools are more tolerant
to noise and natural than two-step tools. Research with a wider
range of smoothing levels and subjective ratings could help de-

148

Skovsgaard Small Target Selection With Gaze Alone

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Viewers also liked

Viewers also liked (20)

Similar to Skovsgaard Small Target Selection With Gaze Alone

Similar to Skovsgaard Small Target Selection With Gaze Alone (20)

More from Kalle

More from Kalle (20)

Skovsgaard Small Target Selection With Gaze Alone