Skovsgaard.2011.evaluation of a remote webcam based eye tracker
1. Evaluation of a Remote Webcam-Based Eye Tracker
Henrik Skovsgaard Javier San Agustin Sune Alstrup Johansen
IT University of Copenhagen IT University of Copenhagen IT University of Copenhagen
Rued Langgaards Vej 7 Rued Langgaards Vej 7 Rued Langgaards Vej 7
2300 Copenhagen S 2300 Copenhagen S 2300 Copenhagen S
hhje@itu.dk javier@itu.dk sune@itu.dk
John Paulin Hansen Martin Tall
IT University of Copenhagen Duke University
Rued Langgaards Vej 7 2424 Erwin Rd. (Hock Plaza)
2300 Copenhagen S Durham, NC 27705
paulin@itu.dk info@martintall.com
ABSTRACT The use of off the-shelf hardware components in gaze track-
In this paper we assess the performance of an open-source ing represents a growing research field [2].
gaze tracker in a remote (i.e. table-mounted) setup, and In 2004, Babcock and Pelz [1] presented a head mounted
compare it with two other commercial eye trackers. An ex- eye-tracker that uses two small cameras attached to a pair
periment with 5 subjects showed the open-source eye tracker of safety glasses. Li et al. [4] extended their work and built
to have a significantly higher level of accuracy than one of a similar system that worked in real time, called OpenEyes.
the commercial systems, Mirametrix S1, but also a higher Being headmounted, both systems are affected by head move-
error rate than the other commercial system, a Tobii T60. ments and are thus not suitable for use in combination with
We conclude that the web-camera solution may be viable a desktop computer. Although the components used in
for people who need a substitute for the mouse input but the systems described above are inexpensive, assembling the
cannot afford a commercial system. hardware requires advanced knowledge of electronics. Zielin-
ski’s Opengazer system [10], based on a remote webcam,
takes a simpler hardware approach. The gaze estimation
Categories and Subject Descriptors method is not tolerant to head movements, and therefore
H.5.2 [Information Interfaces and Presentation]: User the user needs to keep the head still after calibration.
interfaces—Evaluation/methodology Sewell and Komogortsev [7] developed a neural-network
based eye tracker able to run on a personal computer’s built-
General Terms in webcam under normal lightning conditions (i.e., no in-
frared light). The aim of their study was to employ eye
Human factors, Experimentation, Performance, Measure-
tracking without any modifications to the hardware. The
ment
five participants in the study complained that even during
fixations they felt a jumpy sensation of the marker, and that
Keywords the marker was unstable during use.
Gaze interaction, low-cost gaze tracking, performance eval- The ITU Gaze Tracker1 is an open-source gaze tracking
uation, universal access software that can be used with low-cost and off-the-shelf
hardware, such as webcams and video cameras. The soft-
1. INTRODUCTION ware tracks the pupil and one or two corneal reflections pro-
duced by infrared light sources. The first version of the
Gaze tracking systems enable people with severe motor
system was introduced and evaluated by San Agustin et al.
disabilities to communicate using only their eye movements.
in [6]. The results obtained indicated that a low-cost system
However, some of them cannot afford a commercial system,
built with a webcam could have the same performance as ex-
which cost between $5,000 and $30,000. While the quality
pensive commercial systems. However, the system required
of the systems has improved dramatically over the years, the
placing the webcam very close to the user’s eye, which was
price has remained more or less constant. Systems that em-
not comfortable. Furthermore, the camera blocked part of
ploy low-cost and off-the-shelf hardware components are be-
the user’s view. Being a headmounted system, it also re-
coming increasingly popular as camera technology improves.
quired the user to sit completely still, as head movements
affected the cursor position.
The second version of the system enables remote eye track-
Permission to make digital or hard copies of all or part of this work for ing by using a camera with a narrow field-of-view. The same
personal or classroom use is granted without fee provided that copies are webcam used in [6] can be employed by replacing the stan-
not made or distributed for profit or commercial advantage and that copies dard wide-angle lens with an inexpensive 16 mm zoom lens.
bear this notice and the full citation on the first page. To copy otherwise, to Figure 1 shows the hardware configuration for such remote
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
system with a webcam and two light sources.
NGCA ’11 May 26-27 2011, Karlskrona, Sweden 1
Copyright 2011 ACM 978-1-4503-0680-5/11/05 ...$10.00. http://www.gazegroup.org
2. 2.2 Target Acquisition
In order to evaluate the performance of the different input
devices, we followed the methodology described by the ISO
9241-9 standard for non-keyboard input devices [3]. The
performance is quantified by the throughput and error of
each device.
Calculating the throughput is based on the effective tar-
get width We and the effective distance De , which are used
to calculate the effective index of difficulty IDe following
Equation 4. Throughput is measured in bps and is calcu-
lated as the relationship between effective index of difficulty
IDe and movement time M T (Equation 5) [9].
„ «
De
IDe = log2 + 1 , We = 4.133 · SDx (4)
Figure 1: Hardware configuration for the webcam- We
based gaze tracker.
IDe
T hroughput = (5)
MT
The aim of this study is to investigate whether the per-
formance of the remote, webcam-based ITU Gaze Tracker
(costing around $100) can match the performance of two
commercial gaze-tracking systems, a Tobii T60 ($25,000)
and a Mirametrix S1 ($6,000) in an interaction task. 3. PERFORMANCE EVALUATION
2. PERFORMANCE METRICS 3.1 Participants
A total of five participants, three male and two female,
2.1 Accuracy and Precision with ages ranging from 29 to 39 years (M = 34 years, SD =
The performance of a sensor it typically measured in ac- 4.3), volunteered to participate the study. Three of the par-
curacy and precision, where accuracy refers to the degree to ticipants had no previous experience with gaze interaction.
which the sensor readings represent the true value of what is One of them used contact lenses.
measured, while precision (also known as spatial resolution) 3.2 Apparatus
refers to the extent to which successive readings of the same
physical phenomenon agree in value [8]. The computer used was a 2.6 GHz Intel Dual Core pro-
The working copy of the COGAIN report: Eye tracker cessor desktop computer with 3 GB RAM running Windows
accuracy terms and definitions [5] has a set of definitions XP SP3. We used the 17" monitor with a resolution of
and terminologies for measuring accuracy and precision of 1280×1024 that comes with the Tobii T60 system. Three
an eye tracking system. Here, accuracy Adeg is defined as the gaze trackers and a Logitech optical mouse (for baseline com-
average angular distance, θi (measured in degrees of visual parison) were tested as input devices. Two of the three gaze
angle) between n fixation locations and the corresponding trackers were the commercial systems Tobii T60 and Mi-
fixation targets (see Equation 1). rametrix. The third system was the ITU Gaze Tracker us-
ing a Sandberg Nightcam 2 webcam running at 30 fps with a
n
1X 16 mm lens, and two Sony HVL-IRM infrared light sources.
Adeg = θi (1) The total cost was around $100. The three gaze trackers
n i=1
used a 9-point calibration procedure. Figure 2 shows the
Spatial precision is calculated as the Root Mean Square, experimental setup.
RMS, of the angular distance θi (measured in degrees of vi-
sual angle) between successive samples (xi , yi ) to (xi+1 , yi+1 ) 3.3 Design and Procedure
(Equation 2). After calibrating the system, participants completed an
v accuracy test followed by a 2D target-selection task. Partic-
u n
u1 X ipants sat approximately 60 cm away from the monitor, and
RM Sdeg = t θ2 (2)
n i=1 i were asked to sit as still as possible. The experiment was
conducted employing a within-subjects factorial design. The
The working copy of the COGAIN report does not state target-selection task had the following independent variables
how angular distances, θ should be calculated. Distances and levels:
are typically measured in pixels on computers and for this
experiment, we used a function to map distances to pixels, • Device (4): Mouse, Tobii T60, Mirametrix, Webcam
∆px to degrees of visual angle,∆ ◦ . Besides knowing the • Amplitude (2): 450, 900 pixels
distance in pixels, the physical size of a pixel S and the
distance from user to screen D need to be known (Equation • Target Width (2): 75, 100 pixels
3).
„ « The dependent variables in the study were accuracy (de-
360 ∆px · S
∆◦ = · tan−1 (3) grees), precision (degrees), throughput (bps) and error rate
π 2·D (%). Each participant completed 4 blocks of 1 trial (i.e., 4
3. (#'"
(#&" =77+127>"
?1-70,0*@"
(#%"
(#$"
8-91--,":;<"
("
!#'"
!#&"
!#%"
!#$"
!"
)*+,-" .*/00" )0123-4105" 6-/723"
Figure 2: Experimental setup. The participant is
conducting the test using the Mirametrix system. Figure 3: Accuracy and Precision by device. Error
bars show ± SD.
trials) for the accuracy and precision test, and 16 blocks of 15
trials (i.e., 240 trials) for the target-selection task, where de- in Figure 3). The main effect of device on accuracy was
vice, amplitude, and target width were fixed within blocks. statistically significant, F (3, 12) = 16.03, p < 0.001. The
The orders of input device and task were counterbalanced post-hoc test showed a significant difference between mouse
across users to neutralize learning effects. Participants were and all of the gaze trackers. Tobii performed significantly
encouraged to take a comfortable position in front of the better than Mirametrix, t(4) = 3.65, p < 0.05. The webcam
computer and remain as still as possible during the test. also performed significantly better than Mirametrix, t(4) =
The total test session lasted approximately 15 minutes. 4.42, p < 0.05. There was no significant difference between
Immediately after a successful calibration participants were the webcam and Tobii with t(4) = 1.57, p > 0.05.
instructed to gaze on a randomly appearing target in a 4×4 Mean precision for mouse, Tobii, Mirametrix and web-
matrix (evenly distributed with 100 pixels to the borders cam was 0.05◦ , 0.08◦ , 0.43◦ and 0.31◦ , respectively (right-
of the monitor). A new target would appear when a to- side bar in Figure 3). Mauchly’s test indicated that the
tal of 50 samples had been recorded at 30 Hz. Premature assumption of sphericity had been violated, χ(5) = 16.60,
samples were avoided with a smooth animated transition be- p < 0.01, therefore degrees of freedom were corrected us-
tween targets plus a reaction delay of 600 ms. Furthermore, ing Greenhouse-Geisser estimates of sphericity ( = 0.47).
samples further than M ± 3 × SD away were considered as The results show that there was no significant effect on the
outliers. To prevent distractions from cursor movements, we precision of the devices, F (1.42, 5.67) = 4.38, p = 0.08.
hid the cursor throughout the blocks except, of course, for
the mouse condition. 4.2 Throughput and Error Rate
Once the accuracy test was completed, the target selec-
Analysis of the target selection task was performed using a
tion task started. Participants were presented with 15 cir-
4×2×2 ANOVA, with device, amplitude and target width as
cular targets arranged in a circle in the center of the screen.
the independent variables. Throughput and error rate were
Targets were highlighted one-by-one, and participants were
analyzed as the dependent variables. An LSD post-hoc test
instructed to select the highlighted target as quickly and as
was applied after the analysis. All data were included.
accurately as possible. Selections were performed with the
Mean throughput for mouse, Tobii, Mirametrix and web-
spacebar for the gaze trackers and a left-button click for the
cam was 4.00, 2.63, 2.00 and 2.31 bps, respectively (left-side
mouse condition. Activations outside the target area were
bars in Figure 4). The main effect of device on throughput
regarded as misses and were thus considered as the error
was statistically significant, F (3, 12) = 9.61, p 0.01. The
rate. Every selection ended the current trial and started the
post-hoc test showed a significant difference between mouse
next one. Based on the amplitudes and target widths, the
and all other devices. There was a main effect of ampli-
nominal indexes of difficulty were between 2.5 and 3.7 bits.
tude F (3, 12) = 10.73, p 0.05, with short amplitudes (M
= 2.83 bps) having a significantly higher throughput than
4. RESULTS long amplitudes (M = 2.62 bps), t(4) = 3.30, p 0.05. No
significance of target width was found F (3, 12) = 2.00, p =
4.1 Accuracy and Precision 0.23.
Analysis of the accuracy and precision was performed us- Mean error rate for Mouse, Tobii, Mirametrix and Web-
ing a one-way ANOVA, with device as independent variable. cam was 5.34%, 19.21%, 39.29% and 27.50%, respectively
Accuracy and precision were analyzed as the dependent vari- (right-side bars in Figure 4). The main effect of device on
ables. 228 outliers of the 16,000 samples were removed from error rate was statistically significant, F (3, 12) = 9.71, p
the analysis. An LSD post-hoc test was applied after the 0.01. The post-hoc test showed a significant difference be-
analysis. Figure 3 shows a plot of the average accuracy and tween mouse and all other devices. Tobii had a significantly
precision per device. lower error rate than the webcam, t(4) = 4.96, p 0.05.
Mean accuracy for mouse, Tobii, Mirametrix and webcam We found no effect of amplitude F (3, 12) = 0.37, p = 0.58
was 0.14◦ , 0.67◦ , 1.34◦ and 0.88◦ , respectively (left-side bar nor target width F (3, 12) = 0.37, p = 0.58.
4. # #! In our future work, we aim to further investigate these
.=1*+=?+4
'(# '# issues and implement new algorithms to improve the perfor-
811*1924- mance. Specifically, we would like to explore how continuous
' '!
.=1*+=?+4:/?,
recalibrations and repositioning of the participants can im-
811*1924-:;
(# # prove performance over time. In this study we would also
! like test various hardware setups for the ITU Gaze Tracker
%(# %# (e.g. better cameras), and different algorithms for calcu-
lating the point-of-regard. A usability and user experience
% %!
study should also be employed to include subjective mea-
$(# $# sures of the different systems.
$ $! Finally, it is our hope that researchers, students and hob-
!(# # byists will collaborate in the development of the software,
and contribute to make the open-source ITU Gaze Tracker
! ! a more reliable system.
)*+,- .*/00 )0123-4105 6-/723
Figure 4: Overall throughput and error rate by de-
7. ACKNOWLEDGEMENTS
vice. Error bars show ± SD. We would like to thank EYEFACT for supporting the ex-
periment, and the open source community for their help with
improving the ITU Gaze Tracker.
8. REFERENCES
[1] J. S. Babcock and J. B. Pelz. Building a lightweight
5. DISCUSSION eyetracking headgear. In Proceedings of the 2004
Our results suggest that the accuracy of the webcam- symposium on Eye tracking research applications,
based gaze tracker (0.88◦ ) is significantly better than the pages 109–114, San Antonio, Texas, 2004. ACM.
accuracy of the Mirametrix system (1.34◦ ), while showing [2] J. P. Hansen, D. Hansen, and A. Johansen. Bringing
no significant difference to the Tobii T60 (0.67◦ ). This in- gaze-based interaction back to basics. In Universal
dicates that the ITU Gaze Tracker can be used in software Access in HCI (UAHCI): Towards an Information
applications meant to be controlled by gaze input. Society for All, volume 3, pages 325–329, New
Although we did not find any significant effect of the indi- Orleans, USA, 2001. Lawrence Erlbaum.
vidual devices in the precision study, the data indicates that [3] ISO. Ergonomic requirements for office work with
the mouse and the Tobii system had a higher precision than visual display terminals (VDTs) - part 9. In
the Mirametrix S1 and the webcam-based system. It must Requirements for nonkeyboard input devices.
be noted that the precision is calculated after the low-pass International Organization for Standardization, 2000.
filtering that the eye trackers perform on the data samples [4] D. Li, J. Babcock, and D. J. Parkhurst. openEyes. In
during fixations. This is done to smooth the signal and pre- Proceedings of Eye tracking research applications,
vent a jittery cursor from annoying the user. The ITU Gaze pages 95–100, San Diego, California, 2006. ACM.
Tracker gives users control over the level of smooth during [5] F. Mulvey. Eye tracker accuracy terms and definitions
fixations, a feature that many commercial systems do not - working copy. Technical report, COGAIN, 2010.
provide. [6] J. San Agustin, H. Skovsgaard, J. P. Hansen, and
The results obtained in the target-selection task indicate D. W. Hansen. Low-cost gaze interaction: ready to
that the webcam-based eye tracker has a similar perfor- deliver the promises. In Proceedings of CHI’09, pages
mance to the other two commercial systems in terms of 4453–4458, Boston, MA, USA, 2009. ACM.
throughput. The error rate of the webcam tracker was, how-
[7] W. Sewell and O. Komogortsev. Real-time eye gaze
ever, significantly higher than the error rate of the Tobii T60.
tracking with an unmodified commodity webcam
Throughput values were slightly lower than in previous stud-
employing a neural network. In Proceedings of the 28th
ies [6, 9]. This can be due to the lower control over hardware
of the international conference extended abstracts on
setup in our experiment, as well as the lack of experience of
Human factors in computing systems, page 3739–3744,
novice users, who tended to be rather slow.
New York, USA, 2010. ACM. ACM ID: 1754048.
[8] A. D. Wilson. Sensor- and Recognition-Based input
for interaction. In The Human Computer Interaction
Handbook, pages 177–199. Lawrence Erlbaum
Associates, 2007.
[9] X. Zhang and I. S. MacKenzie. Evaluating eye
6. CONCLUSION tracking with ISO 9241 - part 9. In Proceedings of the
Our study on performance evaluation shows that a re- 12th international conference on HCI: intelligent
mote, webcam-based eye tracker can have a performance multimodal interaction environments, pages 779–788,
comparable to expensive systems. However, there are other Beijing, China, 2007. Springer.
crucial factors for the practical usefulness of an eye track- [10] P. Zielinski. Opengazer: open-source gaze tracker for
ing device that have not been evaluated in this study, such ordinary webcams.
as the quality of the documentation, API, tolerance against http://www.inference.phy.cam.ac.uk/opengazer/,
head movements, ease of use and stability over time. 2010.