SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Space-Variant Spatio-Temporal Filtering of Video for Gaze Visualization and
                              Perceptual Learning
                                              Michael Dorr∗                              Halszka Jarodzka†
                                Institute for Neuro- and Bioinformatics           Knowledge Media Research Center
                                          University of L¨ beck
                                                         u                                    T¨ bingen
                                                                                               u
                                                                                   ‡
                                                                     Erhardt Barth
                                                       Institute for Neuro- and Bioinformatics
                                                                 University of L¨ beck
                                                                                u


Abstract
We introduce an algorithm for space-variant filtering of video based
on a spatio-temporal Laplacian pyramid and use this algorithm to
render videos in order to visualize prerecorded eye movements.
Spatio-temporal contrast and colour saturation are reduced as a
function of distance to the nearest gaze point of regard, i.e. non-
fixated, distracting regions are filtered out, whereas fixated image
regions remain unchanged. Results of an experiment in which the
eye movements of an expert on instructional videos are visualized
with this algorithm, so that the gaze of novices is guided to relevant
image locations, show that this visualization technique facilitates
the novices’ perceptual learning.

CR Categories:        I.4.9 [Image Processing and Computer Vi-
sion]: Applications; I.4.3 [Image Processing and Computer Vision]:
Enhancement—Filtering; I.4.10 [Image Processing and Computer                                       Figure 1: Stillshot from an instructional video on classifying fish
Vision]: Image Representation—Multidimensional; K.3.1 [Com-                                        locomotion patterns. The eye movements of the expert giving voice-
puters and Education]: Computer Uses in Education—Computer-                                        over explanations are visualized by space-variant filtering on a
assisted instruction (CAI)                                                                         spatio-temporal Laplacian pyramid: spatio-temporal contrast and
                                                                                                   colour saturation are reduced in unattended areas. This visual-
Keywords:     gaze visualization, space-variant filtering, spatio-                                  ization technique aids novices in acquiring the expert’s perceptual
temporal Laplacian pyramid, perceptual learning                                                    skills.

1     Introduction
                                                                                                   Better suited for visual inspection are approaches that use the stim-
Humans move their eyes around several times per second to suc-                                     ulus and enrich it with eye movement data; in the classical paper of
cessively sample visual scenes with the high-resolution centre of                                  Yarbus [1967], gaze traces overlaid on the original images imme-
the retina. The direction of gaze is tightly linked to attention, and                              diately show the regions that were preferentially looked at by the
what people perceive ultimately depends on where they look [Stone                                  subjects. Because of the noisy nature of both eye movements and
et al. 2003]. Naturally, the ability to record eye movement data led                               their measurements, there is also an indirect indication of fixation
to the need for meaningful visualizations. One-dimensional plots                                   duration (traces are denser in areas of longer fixation). However,
of the horizontal and vertical components of eye position over time                                such abstract information can also be extracted from the raw data
have been in use since the very first gaze recording experiments                                    and presented in condensed form: for example, bars of different size
(Delabarre [1898] affixed a small cap on the cornea to transduce                                    are placed in a three-dimensional view of the original stimulus to
eye movements onto a rotating drum, using plaster of Paris as glue).                               denote fixation duration in [Lankford 2000]; in a more application-
                                                                                                                      ˇ
                                                                                                   specific manner, Spakov and R¨ ih¨ [2008] annotate text with ab-
                                                                                                                                     a a
Such plots are useful for detailed quantitative analyses, but not very
intuitively interpreted. Other tools supporting interpretation of the                              stract information on gaze behaviour for the analysis of translation
data include the visualization of gaze density by means of clustered                               processes.
gaze samples [Heminghous and Duchowski 2006] or the visualiza-                                     Another common method is the use of so-called fixation maps
tion of other features such as fixation duration [Ramloll et al. 2004].                             [Velichkovsky et al. 1996; Wooding 2002]. Here, a probability
    ∗ e-mail:
                                                                                                   density map is computed by the superposition of Gaussians, each
              dorr@inb.uni-luebeck.de                                                              centred at a single fixation (or raw gaze sample), with a subse-
    † e-mail: h.jarodzka@iwm-kwmrc.de
    ‡ e-mail: barth@inb.uni-luebeck.de
                                                                                                   quent normalization step. Areas that were fixated more often are
                                                                                                   thus assigned higher probabilities; by varying the width of the un-
Copyright © 2010 by the Association for Computing Machinery, Inc.                                  derlying Gaussians, it is possible to vary the distance up to which
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
                                                                                                   two fixations are considered similar. Based on this probability
for commercial advantage and that copies bear this notice and the full citation on the             map, the stimulus images are processed so that for example lumi-
first page. Copyrights for components of this work owned by others than ACM must be                nance is gradually reduced in areas that received little attention; so-
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on         called heat maps mark regions of interest with transparently over-
servers, or to redistribute to lists, requires prior specific permission and/or a fee.                               ˇ
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
                                                                                                   laid colours. In [Spakov and Miniotas 2007], the authors add “fog”
permissions@acm.org.                                                                               to render visible only the attended parts of the stimulus.
ETRA 2010, Austin, TX, March 22 – 24, 2010.
© 2010 ACM 978-1-60558-994-7/10/0003 $10.00

                                                                                             307
For dynamic stimuli, such as movies, all the above techniques can
be applied as well; one straightforward extension from images to
                                                                                       G0                               −                       L0
image sequences would be to apply the fixation map technique to
every video frame individually. Care has to be taken, however, to
appropriately filter the gaze input in order to ensure a smooth tran-
sition between video frames.

In this paper, we present an algorithm to visualize dynamic gaze                                  ↓       ↑
density maps by locally modifying spatio-temporal contrast on
a spatio-temporal Laplacian pyramid. In regions of low inter-
est, spectral energy is reduced, i.e. edge and motion intensity are
dampened, whereas regions of high interest remain as in the orig-
inal stimulus. Conceptually, this algorithm is related to gaze-                             G1                          −                  L1
contingent displays simulating visual fields based on Gaussian
pyramids [Geisler and Perry 2002; Nikolov et al. 2004]; in these
approaches, however, only fine spatial details were blurred selec-                                 ↓       ↑
tively. In earlier work, we have already extended these algorithms
to the temporal domain [B¨ hme et al. 2006], and we here com-
                            o
bine both spatial and temporal filtering. Furthermore, the work                               G2                                           L2
presented here goes beyond blurring, i.e. the specification of a sin-
gle cutoff frequency per output pixel, and allows to assign indi-
vidual weights to each frequency band. This means that, for ex-                 Figure 2: Analysis phase of a Laplacian pyramid in space. Based
ample, spatio-temporal contrast can be modified while leaving de-                on the Gaussian pyramid on the left side, which stores successively
tails intact. However, although our implementation of this algo-                smaller image versions (with higher-frequency content successively
rithm achieves more than 50 frames per second rendering perfor-                 removed), differences of Gaussian pyramid levels are formed to ob-
mance on a commodity PC, it cannot be used for real-time gaze-                  tain individual frequency bands (right side). To be able to form
contingent applications, because there all levels of the underlying             these differences, lower levels have to be upsampled before sub-
pyramid need to be upsampled to full temporal resolution for ev-                traction (middle). The gray bars indicate – relative to the original
ery video frame. Its purpose is the off-line visualization of prere-            spectrum – what frequency band is stored in each image. The ex-
corded gaze patterns. A computationally much more challenging                   tension into the temporal domain results in lower frame rates for
version of a spatio-temporal Laplacian pyramid that is suitable also            the smaller video versions (not shown).
for gaze-contingent displays is the topic of a forthcoming paper.

Pyramid-based rendering as a function of gaze has been shown to
have a guiding effect on eye movements [Dorr et al. 2008; Barth                   L0                                        +
et al. 2006]. To further demonstrate the usefulness of our algorithm,
we will present some results from a validation experiment in which
students received instructional videos either with or without a vi-                                                                       R0 = G0
sualization of the eye movements of an expert watching the same
stimulus. Results show that the visualization technique presented
here indeed facilitates perceptual learning and improves students’
later visual search performance on novel stimuli.                                      L1                 +         ↑               R1


2 Laplacian Pyramid in Space and Space-
  Time                                                                                  L2            ↑       R2

The so-called Laplacian pyramid serves as an efficient bandpass
representation of an image [Burt and Adelson 1983]. In the fol-                 Figure 3: Synthesis phase of a Laplacian pyramid in space. The
lowing section, we will briefly review its application to images and             Laplacian levels are iteratively upsampled to obtain a series of re-
then extend the algorithm to the spatio-temporal domain. We will                constructed images RN , RN −1 , . . . , R0 with increasing cutoff fre-
here use an isotropic pyramid, i.e. all spatial and temporal dimen-             quencies. If the Ln remain unchanged, R0 is an exact reproduction
sions are treated equally; this results in a bandpass representation in         of the original input image G0 .
which e.g. low spatial and low temporal and high spatial and high
temporal frequencies are represented together, respectively. For
a finer-grained decomposition of the image sequence into spatio-
temporal frequency bands, an anisotropic Laplacian pyramid could                2.1 Spatial Domain
be used instead. Using such a decomposition, one could also ob-
tain frequency bands of high spatial but low temporal frequencies               The Laplacian pyramid is based on a Gaussian multiresolution
etc. For a straightforward implementation, one might first create a              pyramid, which stores successively smaller versions of an image;
spatial pyramid for each frame of the input sequence, then decom-               usually, resolution is reduced by a factor of two in each downsam-
pose each level of that spatial pyramid in time (as in Section 2.2.2,           pling step (for two-dimensional images, the number of pixels is thus
but omitting the spatial up- and downsampling). However, the finer               reduced by a factor of four). Prior to each downsampling step, the
spectral resolution comes at the cost of a significantly increased               image is appropriately lowpass filtered so that high-frequency con-
number of pixels that need to be stored and processed; this increase            tent is removed; the lower-resolution downsampling result then ful-
is on the order of 1.16 · T times as many pixels for an anisotropic             fills the conditions of the Nyquist theorem and can represent the
pyramid with T temporal levels.                                                 (filtered) image without aliasing. For a schematic overview, we re-


                                                                          308
G0 (0)           G0 (1)           G0 (2)           G0 (3)             G0 (4)              G0 (5)           G0 (6)          G0 (7)           G0 (8)



                                                        w−2     w−1        w0          w1      w2

    G1 (0)                            G1 (1)                              G1 (2)                               G1 (3)                           G1 (4)

                                       w−2              w−1                w0                  w1               w2


    G2 (0)                                                                G2 (1)                                                                G2 (2)


Figure 4: Spatio-temporal Gaussian pyramid with three levels and c = 2. Lower levels have reduced resolution both in space and in time.
Note that history and lookahead video frames are required because of the temporal filter with symmetric kernel, e.g. computation of G2 (1)
depends on G1 (4), which in turn depends on G0 (6).


fer to Figures 2 and 3.                                                            (again, resolution is reduced by a factor of two in all dimensions
                                                                                   with increasing k); the intermediate steps during the iterative re-
2.2 Spatio-Temporal Domain                                                         construction of the original signal are denoted as Rk (t).
                                                                                   The temporal filtering which is required for temporal down- and
The Gaussian pyramid algorithm was first applied to the temporal                    upsampling introduces a latency (see next sections). The number
domain by Uz et al. [1991]. In analogy to dropping every other                     of lookahead items required on level k is denoted by λk for the
pixel during the downsampling step, every other frame of an image                  analysis phase and by Λk for the synthesis phase.
sequence is discarded to obtain lower pyramid levels (cf. Figure 4).
In the temporal domain, however, the problem arises that the num-                  2.2.2    Analysis Phase
ber of frames to process is not necessarily known in advance and
is potentially infinite; it is therefore not feasible to store the whole            To compute the Laplacian levels, the Gaussian pyramid has to be
image sequence in memory. Nevertheless, video frames from the                      created first (see Figure 2). The relationship of different Gaussian
past and the future need to be accessed during the lowpass filtering                levels is shown in Figure 4; lower levels are obtained by lowpass
step prior to the downsampling operation; thus, care has to be taken               filtering and spatially downsampling higher levels:
which frames to buffer. In the following, we will refer to these                                                                  , c
frames as history and lookahead frames, respectively.                                                    X c                          X
                                                                                            Gk+1 (n) =       wi · ↓ [Gk (2n − i)]          wi .
Both for the subtraction of adjacent Gaussian pyramid levels (to                                        i=−c                            i=−c
create Laplacian levels) and for the reconstruction step (in which
                                                                                   We here use a binomial filter kernel (1, 4, 6, 4, 1) with c = 2.
the Laplacian levels are recombined), lower levels first have to be
upsampled to match the resolution of the higher level. Following                   The Laplacian levels are then computed as differences of adjacent
these upsampling steps, the results have to be filtered to interpo-                 Gaussian levels (the lowest level LN is the same as the lowest Gaus-
late at the inserted pixels and frames; again, history and lookahead               sian level GN ); before performing the subtraction, the lower level
video frames are required. We will now describe these operations in                has to be brought back to a matching resolution again by inserting
more detail and analyse the number of video frames to be buffered.                 zeros (blank frames) to upsample and subsequent lowpass filtering.
                                                                                   In practice, the inserted frames can be ignored and their correspond-
2.2.1 Notation                                                                     ing filter coefficients are set to zero:
                                                                                                        x2                                            3
                                                                                                        ? X                  „       «, X
The sequence of input images is denoted by I(t); input images have                                      ?                      n−i
a size of W by H pixels and an arbitrary number of colour chan-                    Lk (n) = Gk (n)−?4   ?         wi · Gk+1                        wi 5 ,
                                                                                                        ? i∈P (n)                2
                                                                                                                                           i∈P (n)
nels (individual channels are treated separately). A single pixel at
location (x, y) and time t is referred to as I(t)(x, y); in the follow-            with P (n) = {j = −c, . . . , c | (n − j) mod 2 = 0} giving the set
ing, operations on whole images, such as addition, are to be applied               of valid images on the lower level.
pixelwise to all pixels.
                                                                                   Based on these equations, we can now derive the number of looka-
The individual levels of a Gaussian multiresolution pyramid with                   head items required for the generation of the Laplacian. For the
N + 1 levels are referred to as Gk (t), 0 ≤ k ≤ N . The highest                    upsampling of lower Gaussian levels, we need a lookahead of
level G0 is the same as the input sequence; because of the spatio-                 β = c+1 images on each level, with            denoting floating-point
                                                                                            2
temporal downsampling, lower levels have fewer pixels and a lower                  truncation. Starting on the lowest level GN , this implies that 2β + c
frame rate, so that Gk (n) has a spatial resolution of W/2k by H/2k                images must be available on level GN −1 during the downsam-
pixels and corresponds to the same point in time as G0 (2k n). Spa-                pling phase; we can repeatedly follow this argument and obtain
tial up- and downsampling operations on an image I are denoted                     λk = 2N −k · (β + c) − c as the number of required lookahead
as ↑ [I] and ↓ [I], respectively. For time steps t that are not a                  images for level k.
multiple of 2N , not all pyramid levels have a corresponding image
Gk (t/2k ). Therefore, Ct denotes the highest index of levels with                 2.2.3    Synthesis Phase
valid images at time t, i.e. Ct is the largest integer with Ct ≤ N
and t mod 2Ct = 0. Similar to the Gaussian levels Gk , we re-                      Turning now to the synthesis phase of the Laplacian pyramid, we
fer to the levels of the Laplacian pyramid as Lk (t), 0 ≤ k ≤ N                    note from Figure 3 that the Laplacian levels are sucessively upsam-


                                                                          309
R0 (t − 1)         R0 (t)        L0 (t + 1)                                                                          L0 (t + Λ0 )




                                     ·
                 w−2                w0               w2
                                                                  R1 ( t+c )
                                                                        2                                                         L1 (t/2 + Λ1 )



                                                  w−1                   ·                w1     L2 ( t/4 + Λ2 )


Figure 5: Synthesis step of spatio-temporal Laplacian pyramid. Here shown are both the Laplacian levels Li and the (partially) reconstructed
images Ri , which are based on lower levels with indices ≥ i; in practice, the same buffers can be used for both R and L. For example, to
compute the reconstruction R0 (t) of the original image, we have to add L0 (t) to a spatio-temporally upsampled version of R1 . The second
level L1 is combined with the upsampling result of L2 in R1 ( t+c ) = R1 ( 2 + β1 ) (see pseudocode). In this schematic overview, new frames
                                                               2
                                                                           t

are added on the right side and shifted leftwards with time.


pled and added up to reconstruct the original image; this simply               Gaussian pyramid in memory twice, once in the “correct” size and
is the inverse of the “upsample–and–subtract” operation during the             once in the downsampled version; for each frame of the input video,
analysis phase. On the lowest level, RN (n) = LN (n); for higher               only one downsampling operation has to be executed then. In anal-
levels, the intermediate reconstructed images are computed as                  ogy to the Gaussian levels, both the Laplacian and the (partially)
                               x                               3               reconstructed levels L and R can be held together in one buffer per
                       X       ?»        „      «, X                           level k with Λk lookahead, one current image, and the βk history.
                               ?           n−i
 Rk (n) = Lk (n) +          wi ? Rk+1
                               ?                            wi 5.
                               ?            2
                     i∈P (n)                            i∈P (n)
                                                                  (1)          In practice, the output of the pyramid can be accessed only with
Clearly, a further latency is incurred between the point in time for           a certain latency because of the symmetric temporal filters that re-
which bandpass information and the reconstructed or filtered image              quire video frames from the future. Input images are fed into looka-
are available. Similar to the study of the analysis phase in the pre-          head position λ0 of buffer G0 , and images are shifted towards the
vious section, we can compute the number Λk of required looka-                 “current” position by one position for every new video frame. This
head items on each level by induction. On the lowest level LN ,                means that only λ0 many time steps after video frame I(t0 ) has
again β = c+1 images are required for the upsampling opera-                    been added, the Gaussian images G0 to GN that represent I(t0 )
                2
tion, which corresponds to 2β images on level N − 1. As can be                 at various spatio-temporal resolutions are available in the “current”
seen in Figure 5, the result of the upsampling operation is added              positions of the Gaussian buffers. The resulting differences L0 to
to the β-th lookahead item on level N − 1, so that ΛN −1 = 3β.                 LN then are stored at the lookahead positions Λ0 to ΛN of the
Repeating this computation, we obtain Λk = (2N +1−k − 1) · β; for              Laplacian buffers, respectively; here, different frequency bands can
L0 , however, no further upsampling is required, so it is possible to          be accessed both for analysis and modification. Only Λ0 time steps
reduce the lookahead on the highest level to Λ0 = (2N +1 − 2) · β.             later does the input image I reappear after pyramid synthesis; over-
                                                                               all, this leads to a pyramid latency between input and output of
In practice, we do not need to differentiate explicitly between L and          λ0 + Λ0 time steps.
R; the same buffers can be used for both L and R images. Care has
to be taken then not to perform a desired modification of a given
Laplacian level on a buffer that already contains information from             The necessary buffering and the handling of lookahead frames
lower levels as well (i.e. an R image).                                        could be reduced and simplified if causal filters were used; a fur-
                                                                               ther possibility to efficiently filter in time without lookahead is to
2.3 Pseudocode and Implementation                                              use temporally recursive filters. However, any non-symmetry in
                                                                               the filters will introduce phase shifts. Particularly in the case of
We are now ready to bring together the above observations and                  space-variant filtering (see below), this would produce image arte-
put them into pseudocode, see Algorithms 1 and 2. Based on                     facts (such as a pedestrian with disconnected – fast – legs and –
P (n) above, the index function that determines which images                   relatively slow – upper body).
are available` on lower levels in the following is Pk (t) = {j =
                            ´
−c, . . . , c | 2tk + βk − j mod 2 = 0}. In the synthesis phase,
the image offset βk at which the recombination L and R takes place             3 Gaze Visualization
can be set to zero on the highest level; we therefore use βk = β for
k > 0, β0 = 0.
From the pseudocode, a buffering scheme for the implementation                 In the previous section, we described the analysis and synthesis
directly follows. First, images from the Gaussian pyramid have to              phase of a spatio-temporal Laplacian pyramid. However, the re-
be stored; each level k needs at least λk lookahead images, one                sult of the synthesis phase is a mere reconstruction of the original
current image, and βk history. Trading memory requirements for                 image sequence; we want to filter the image sequence based on a
computational costs, it is also possible to keep all images of the             list of gaze positions instead.


                                                                        310
Algorithm 1 Pseudocode for one time step of the pyramid analysis              Algorithm 2 Pseudocode for one time step of the pyramid synthesis
phase.                                                                        phase.
Input: t                 Time step to update the pyramid for                  Input: t                 Time step to update the pyramid for
        G0 , . . . , GN Levels of the Gaussian pyramid                                G0 , . . . , GN Levels of the Gaussian pyramid
        L0 , . . . , L N Levels of the Laplacian pyramid                              L0 , . . . , L N Levels of the Laplacian pyramid

  Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0})                                  Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0})
                                       Gaussian pyramid creation                for k = Ct , . . . , 0 do
  G0 (t + λ0 ) = I(t + λ0 )                                                         if k = N then
  for k = 1, . . . , Ct do                                                                          „         «      „         «
                                                                                                        t               t
                              „        «                                                    RN            + βN = L N      + βN
                                 t                                                                    2N               2N
                           Gk      + λk =
                                2k
                                                                                    else           „             «          „          «
        Xc          ?»      „               «– , X   c                                                  t                        t
                    ?           t                                                             Rk          + βk       = Lk          + βk +
            wi · y  ? Gk−1         + 2λk − i            wi                                             2k                       2k
       i=−c
                              2k−1                 i=−c
                                                                                              x2                                   3
  end for                                                                                     ?                    „              «
                                                                                              ? X                      t    βk − i 5
                                    Laplacian pyramid creation                                ?4         wi · Rk+1        +
                                                                                              ?                      2k+1
  for k = 0, . . . , Ct do                                                                    ? i∈Pk (t)                      2
      if k = N then
                     „         «     „           «                                 end if
                        t               t
              LN           + ΛN = GN       + ΛN                                 end for
                       2N              2N

      else
                    „             «          „     «                          very strongly lowpass-filtered video version; this means that some
                         t                   t                                coarse spatio-temporal structure remains even in regions where all
               Lk          + Λk       = Gk     + Λk −
                        2k                  2k                                contrast in higher levels is removed by setting the coefficient map
     x2                                          3                            to zero. The temporal multiresolution character of the pyramid also
     ? X                  „                     « , X                         adds smoothness to changes in the coefficient maps over time; be-
     ?                        t           Λk − i 5
     ?4         wi · Gk+1               +             wi                      cause temporal levels are updated at varying rates, such changes are
     ?                      2k+1            2
     ? i∈Pk (t)                                       i∈Pk (t)                introduced gradually. Finally, by using different coefficient maps
                                                                              for each level, it is trivially possible to highlight certain frequency
     end if                                                                   bands, which is impossible based on a computation of the mean
  end for                                                                     alone.

                                                                              4 Attentional Guidance
3.1 Space-Variant Pyramid Synthesis
                                                                              Pyramid-based rendering of video as a function of gaze has been
We now introduce the concept of a coefficient map that indicates               shown to have a guiding effect on eye movements. For example, the
how spectral energy should be modified in each frequency band at               introduction of peripheral temporal blur on a gaze-contingent dis-
each pixel of the output image sequence. We denote the coefficient             play reduces the number of large-amplitude saccades [Barth et al.
map for level k at time t with Wk (t); the Wk have the same spatial           2006], even though the visibility of such blur is low [Dorr et al.
resolution as the corresponding Lk , i.e. W/2k by H/2k pixels.                2005]. Using a real-time gaze-contingent version of a spatial Lapla-
                                                                              cian pyramid, locally reducing (spatial) spectral energy at likely fix-
To bandpass-filter the image sequence, the Laplacian levels Lk are             ation points also changes eye movement characteristics [Dorr et al.
simply multiplied pixel-wise with the Wk prior to the recombina-              2008].
tion into Rk .
                                                                              In the following, we will therefore briefly summarize how our gaze
Based on the pseudocode (Algorithm 3), we can see that coefficient             visualization algorithm can be applied in a learning task to guide
maps for different points in time are applied to the different levels         the student’s gaze. For further details of this experiment, we refer
in each synthesis step of the pyramid; this follows from the itera-           to [Jarodzka et al. 2010b].
tive recombination of L into the reconstructed levels. In practice,
a more straightforward solution is to apply coefficient maps corre-
sponding to one time t to the farthest lookahead item Λk of each              4.1 Perceptual Learning
level Lk (i.e. right after subtraction of adjacent Gaussian levels).
                                                                              In many problem domains, experts develop efficient eye movement
As noted before, in the following validation experiment we will use           strategies because the underlying problem requires substantial vi-
the same coefficient map for all levels (for computational efficiency,          sual search. Examples include the analysis of radiograms [Lesgold
however, coefficient maps for lower levels can be stored with fewer            et al. 1988], driving [Underwood et al. 2003], and the classifica-
pixels). In principle, this means that a similar effect could be              tion of fish locomotion [Jarodzka et al. 2010a]. In order to aid
achieved by computing the mean pixel intensity of the whole image             novices in acquiring the efficient eye movement strategies of an ex-
sequence and then, depending on gaze position, smoothly blend-                pert, it is possible to use cueing to guide their attention towards rel-
ing between this mean value and each video pixel. However, for                evant stimulus locations; however, it often remains unclear where
practical reasons, the lowest level of the pyramid does not repre-            and how to cue the user. Van Gog et al. [2009] guided attention
sent the “true” DC (the mean of the image sequence), but merely a             during problem-solving tasks by directly displaying the eye move-


                                                                        311
1.0
Algorithm 3 Pseudocode for one time step of the space-variant syn-
thesis phase.
Input: t                  Time step to update the pyramid for




                                                                                                   0.8
         G0 , . . . , GN  Levels of the Gaussian pyramid
         L0 , . . . , L N Levels of the Laplacian pyramid




                                                                               Relative contrast

                                                                                                   0.6
         W0 , . . . , WN Coefficient maps

  Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0})




                                                                                                   0.4
  for k = Ct , . . . , 0 do
      if k = N then




                                                                                                   0.2
                      „         «      „         «
                          t               t
              RN            + βN = L N      + βN




                                                                                                   0.0
                        2N               2N
                                                                                                         −300   −200   −100           0            100   200   300
     else
     „        «    „            «   „            «
                                                                                                                              Eccentricity [pix]

        t                 t                t
  Rk      + βk = Wk x, y, k + βk ·Lk x, y, k + βk +                            Figure 6: Eccentricity-dependent resolution map: at centre of fix-
       2k                2                2
                                                                               ation, spectral energy remains the same; energy is increasingly re-
                                                                               duced with increasing distance from gaze position.
               x2                                   3
               ?                    „              «
               ? X                      t    βk − i 5
               ?4         wi · Rk+1        +
               ?                      2k+1     2
               ? i∈Pk (t)                                                      4.3 Gaze Filtering

     end if                                                                    Functionally, a sequence of eye movements consists of a series of
  end for                                                                      fixations, where eye position remains constant, and saccades, dur-
                                                                               ing which eye position changes rapidly (smooth pursuit movements
                                                                               here can be understood as fixations where gaze position remains
                                                                               constant on a moving object). In practice, however, the eye posi-
ments of an expert made during performing the same task on mod-                tion as measured by the eye tracker hardly ever stays constant from
eling examples, but found that the attentional guidance actually de-           one sample to the next; the fixational instability of the oculomotor
creased novices’ subsequent test performance instead of facilitating           system, minor head movements, and noise in the camera system of
the learning process. One possible explanation of this effect could            the eye tracker all contribute to the effect that the measured eye po-
be that the chosen method of guidance (a red dot at the experts’               sition exhibits a substantial jitter. If this jitter were to be replayed to
gaze position that grew in size with fixation duration) was not opti-           the novice, such constant erratic motion might distract the observer
mal because the gaze marker covered exactly those visual features it           from the very scene that gaze guidance is supposed to highlight. In
was supposed to highlight, and its dynamical nature might have dis-            order to reduce the jitter, raw gaze data was filtered with a tempo-
tracted the observers. To avoid this problem, we here use the space-           ral Gaussian lowpass filter with a support of 200 ms and a standard
variant filtering algorithm presented in the previous sections to ren-          deviation of 42 ms.
der instructional videos such that the viewer’s attention is guided
to those areas that were attended by the expert. However, instead
of altering these attended areas, we decrease spatio-temporal con-             4.4 Space-Variant Filtering and Colour Removal
trast (i.e. edge and motion intensity) elsewhere, in order to increase
the relative visual saliency of the problem-relevant areas without             A Laplacian pyramid with five levels was used; coefficient maps
covering them or introducing artefacts.                                        were created in such a way that the original image sequence was
                                                                               reconstructed faithfully in the fixated area (the weight of all lev-
                                                                               els during pyramid synthesis was set to 1.0) and spatio-temporal
4.2 Stimulus Material and Experimental Setup                                   changes were diminished (all level weights set to 0.0) in those ar-
                                                                               eas that the expert had only seen peripherally. On the highest level,
Eight videos of different fish species with a duration of 4 s each              the first zone was defined by a radius of 32 pixels around gaze po-
were recorded, depicting different locomotion patterns. They had               sition and weights were set to 0.0 outside a radius of 256 pixels;
a spatial resolution of 720 by 576 pixels and a frame rate of 25               these radii approximately corresponded to 1.15 and 9.2 degrees of
frames per second. Four of these videos were shown in a contin-                visual angle, respectively. In parafoveal vision, weights were grad-
uous loop to an expert on fish locomotion (a professor of marine                ually decreased from 1.0 to 0.0 for a smooth transition, following
zoology) and his eye movements were collected using a Tobii 1750               a Gaussian falloff with a standard deviation of 40 pixels (see Fig-
remote eye tracker running at 50 Hz. Simultaneously, a spoken di-              ure 6). Furthermore, these maps were produced not only by plac-
dactical explanation of the locomotion pattern (i.e. how different             ing a mask at the current gaze position in each video frame; in-
body parts moved) was recorded. These four videos were shown to                stead, masks for all gaze positions of the preceding and following
72 subjects (university students without prior task experience) in a           300 ms were superimposed and the coefficient map was then nor-
training phase either as-is, with the expert’s eye movements marked            malized to a maximum of 1.0. During periods of fixation, this su-
by a simple yellow disk at gaze position, or with attentional guid-            perposition had little or no effect; during saccades, however, this
ance by the pyramid-based contrast reduction. In the subsequent                procedure elongated the radially symmetric coefficient map along
test or recall phase, the remaining four videos were shown to the              the direction of the saccade. Thus, the observer was able to fol-
subjects without any modification. After presentation, subjects had             low the expert’s saccades and unpredictable large displacements of
to apply the knowledge acquired during the training phase and had              the unmodified area were prevented. Finally, colour saturation was
to name and describe the locomotion pattern displayed in each test             also removed from non-attended areas similar to the reduction of
video; the number of correct answers yielded a performance score.              spectral energy; here, complete removal of colour started outside


                                                                         312
a radius of 384 pixels around gaze, and the Gaussian falloff in the            for comments on a first draft of this manuscript.
transition area had a standard deviation of 67 pixels. Note that these
parameters were determined rather informally to find a reasonable
trade-off between a focus that would be too restricted (if the fo-             References
cus were only a few pixels wide, discrimination of relevant features
would be impossible) and a wide focus that would be without effect                                         ¨
                                                                               BARTH , E., D ORR , M., B OHME , M., G EGENFURTNER , K. R.,
(if the unmodified area encompassed the whole stimulus). As such,                 AND M ARTINETZ , T. 2006. Guiding the mind’s eye: improv-
these parameters are likely to be specific to the stimulus material               ing communication and vision by external control of the scan-
used here. For a thorough investigation of the visibility of peripher-           path. In Human Vision and Electronic Imaging, B. E. Rogowitz,
ally removed colour saturation using a gaze-contingent display, we               T. N. Pappas, and S. J. Daly, Eds., vol. 6057 of Proc. SPIE. In-
refer to Duchowski et al. [2009].                                                vited contribution for a special session on Eye Movements, Vi-
                                                                                 sual Search, and Attention: a Tribute to Larry Stark.
For an example frame, see Figure 1; a demo video is available on-
line at http://www.gazecom.eu/demo-material.                                     ¨
                                                                               B OHME , M., D ORR , M., M ARTINETZ , T., AND BARTH , E. 2006.
                                                                                  Gaze-contingent temporal filtering of video. In Proceedings of
4.5 Results                                                                       Eye Tracking Research & Applications (ETRA), 109–115.
                                                                               B URT, P. J., AND A DELSON , E. H. 1983. The Laplacian pyramid
Previous research has already shown that providing a gaze marker                  as a compact image code. IEEE Transactions on Communica-
in the highly perceptual task of classifying fish locomotion facili-               tions 31, 4, 532–540.
tates perceptual learning: subjects look at relevant movie regions
for a longer time and take less time to find relevant locations after           D ELABARRE , E. B. 1898. A method of recording eye-movements.
stimulus onset, which in turn results in higher performance scores                American Journal of Psychology 9, 4, 572–574.
in subsequent tests on novel stimuli [Jarodzka et al. 2009]. The gaze
visualization technique presented here does not cover these relevant                          ¨
                                                                               D ORR , M., B OHME , M., M ARTINETZ , T., AND BARTH , E. 2005.
locations; subjects’ visual search performance is improved even be-               Visibility of temporal blur on a gaze-contingent display. In
yond that obtained with the simple gaze marker. Time needed to                    APGV 2005 ACM SIGGRAPH Symposium on Applied Percep-
find relevant locations after stimulus onset decreases by 21.26%                   tion in Graphics and Visualization, 33–36.
compared to the gaze marker condition and by 27.53% compared
to the condition without any guidance. Moreover, dwell time on the             D ORR , M., V IG , E., G EGENFURTNER , K. R., M ARTINETZ , T.,
                                                                                  AND BARTH , E. 2008. Eye movement modelling and gaze guid-
relevant locations increases by 7.25% compared to the gaze marker
condition and by 48.82% compared to the condition without any                     ance. In Fourth International Workshop on Human-Computer
guidance. For a more in-depth analysis see [Jarodzka et al. 2010b].               Conversation.
                                                                               D UCHOWSKI , A. T., BATE , D., S TRINGFELLOW, P., T HAKUR ,
5 Conclusion                                                                      K., M ELLOY, B. J., AND G RAMOPADHYE , A. K. 2009.
                                                                                  On spatiochromatic visual sensitivity and peripheral color LOD
We have presented a novel algorithm to perform space-variant fil-                  management. ACM Transactions on Applied Perception 6, 2, 1–
tering of a movie based on a spatio-temporal Laplacian pyramid.                   18.
One application is the visualization of eye movements on videos;
spatio-temporal contrast is modified as a function of gaze density,             G EISLER , W. S., AND P ERRY, J. S. 2002. Real-time simulation
i.e. spectral energy is reduced in regions of low interest. In a val-             of arbitrary visual fields. In ETRA ’02: Proceedings of the 2002
idation experiment, subjects watched instructional videos on fish                  symposium on Eye tracking research & applications. ACM, New
locomotion either with or without visualization of the eye move-                  York, NY, USA, 83–87.
ments of an expert. We were able to show that on novel test stimuli,           H EMINGHOUS , J., AND D UCHOWSKI , A. T. 2006. iComp: a tool
subjects who had received such information performed better than                  for scanpath visualization and comparison. In APGV ’06: Pro-
subjects who had not benefited from the expert’s eye movements                     ceedings of the 3rd symposium on Applied perception in graphics
during training, and that the gaze visualization technique presented              and visualization, ACM, New York, NY, USA, 152–152.
here facilitated learning better than a simple gaze display (yellow
gaze marker). In principle, any visualization technique that reduces           JARODZKA , H., S CHEITER , K., G ERJETS , P., VAN G OG , T., AND
the relative visibility of those regions not attended by the expert              D ORR , M. 2009. How to convey perceptual skills by displaying
might have a similar effect; our choice for this particular technique            experts’ gaze data. In Proceedings of the 31st Annual Conference
was motivated by our work on eye movement prediction [Dorr et al.                of the Cognitive Science Society, Austin, TX: Cognitive Science
2008; Vig et al. 2009], which shows that spectral energy is a good               Society, N. A. Taatgen and H. van Rijn, Eds., 2920–2925.
predictor for eye movements. Ultimately, we intend to use similar
techniques in a gaze-contingent fashion in order to guide the gaze             JARODZKA , H., S CHEITER , K., G ERJETS , P., AND VAN G OG ,
of an observer [Barth et al. 2006].                                              T. 2010. In the eyes of the beholder: How experts and novices
                                                                                 interpret dynamic stimuli. Journal of Learning and Instruction
                                                                                 20, 2, 146–154.
Acknowledgements
                                                                               JARODZKA , H., VAN G OG , T., D ORR , M., S CHEITER , K., AND
Our research has received funding from the European Commission                   G ERJETS , P. 2010. Guiding attention guides thought, but what
within the project GazeCom (contract no. IST-C-033816) of the                    about learning? Eye movements in modeling examples. (sub-
6th Framework Programme. All views expressed herein are those                    mitted).
of the authors alone; the European Community is not liable for
any use made of the information. Halszka Jarodzka was supported                L ANKFORD , C. 2000. Gazetracker: software designed to facilitate
by the Leibniz-Gemeinschaft within the project “Resource-adaptive                 eye movement analysis. In ETRA ’00: Proceedings of the 2000
design of visualizations for supporting the comprehension of com-                 symposium on Eye tracking research & applications, ACM, New
plex dynamics in the Natural Sciences”. We thank Martin B¨ hme
                                                            o                     York, NY, USA, 51–55.


                                                                         313
L ESGOLD , A., RUBINSON , H., F ELTOVICH , P., G LASER , R.,
   K LOPFER , D., AND WANG , Y. 1988. Expertise in a com-
   plex skill: diagnosing X-ray pictures. In The nature of exper-
   tise, M. T. H. Chi, R. Glaser, and M. Farr, Eds. Hillsdale, NJ:
   Erlbaum, 311–342.
N IKOLOV, S. G., N EWMAN , T. D., B ULL , D. R., C ANAGARA -
   JAH , C. N., J ONES , M. G., AND G ILCHRIST, I. D. 2004.
   Gaze-contingent display using texture mapping and OpenGL:
   system and applications. In Eye Tracking Research & Appli-
   cations (ETRA), 11–18.
R AMLOLL , R., T REPAGNIER , C., S EBRECHTS , M., AND
   B EEDASY, J. 2004. Gaze data visualization tools: Opportu-
   nities and challenges. In IV ’04: Proceedings of the Information
   Visualisation, Eighth International Conference, IEEE Computer
   Society, Washington, DC, USA, 173–180.
S TONE , L. S., M ILES , F. A., AND BANKS , M. S. 2003. Linking
   eye movements and perception. Journal of Vision 3, 11 (11),
   i–iii.
U NDERWOOD , G., C HAPMAN , P., B ROCKLEHURST, N., U NDER -
   WOOD , J., AND C RUNDALL , D. 2003. Visual attention while
   driving: Sequences of eye fixations made by experienced and
   novice drivers. Ergonomics 46, 629–646.
U Z , K. M., V ETTERLI , M., AND L E G ALL , D. J. 1991. Interpola-
   tive multiresolution coding of advanced television with compati-
   ble subchannels. IEEE Transactions on Circuits and Systems for
   Video Technology 1, 1, 86–99.
VAN G OG , T., JARODZKA , H., S CHEITER , K., G ERJETS , P., AND
  PAAS , F. 2009. Effects of attention guidance during example
  study by showing students the models’ eye movements. Com-
  puters in Human Behavior 25, 785–791.
V ELICHKOVSKY, B., P OMPLUN , M., AND R IESER , J. 1996.
   Attention and communication: Eye-movement-based research
   paradigms. In Visual Attention and Cognition, W. H. Zangemeis-
   ter, H. S. Stiehl, and C. Freksa, Eds. Amsterdam, Netherlands:
   Elsevier Science, 125–54.
V IG , E., D ORR , M., AND BARTH , E. 2009. Efficient visual cod-
   ing and the predictability of eye movements on natural movies.
   Spatial Vision 22, 5, 397–408.
ˇ
S PAKOV, O., AND M INIOTAS , D. 2007. Visualization of eye gaze
   data using heat maps. Electronica & Electrical Engineering 2,
   55–58.
ˇ                      ¨ ¨
S PAKOV, O., AND R AIH A , K.-J. 2008. KiEV: A tool for visual-
   ization of reading and writing processes in translation of text. In
   ETRA ’08: Proceedings of the 2008 symposium on Eye tracking
   research & applications, ACM, New York, NY, USA, 107–110.
W OODING , D. S. 2002. Fixation maps: Quantifying eye-
  movement traces. In ETRA ’02: Proceedings of the 2002 sympo-
  sium on Eye tracking research & applications, ACM, New York,
  NY, USA, 31–36.
YARBUS , A. L. 1967. Eye Movements and Vision. Plenum Press,
  New York.




                                                                         314

Contenu connexe

Tendances

Implementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBImplementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBmelaneum
 
Colored Image Watermarking Technique Based on HVS using HSV Color Model
Colored Image Watermarking Technique Based on HVS using HSV Color ModelColored Image Watermarking Technique Based on HVS using HSV Color Model
Colored Image Watermarking Technique Based on HVS using HSV Color ModelIDES Editor
 
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...ijtsrd
 
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingCoutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingKalle
 
A Novel Method for Image Enhancement
A Novel Method for Image EnhancementA Novel Method for Image Enhancement
A Novel Method for Image EnhancementIDES Editor
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective Viewijtsrd
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...ijsrd.com
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person trackingsajit1975
 
Color Image Segmentation based on JND Color Histogram
Color Image Segmentation based on JND Color HistogramColor Image Segmentation based on JND Color Histogram
Color Image Segmentation based on JND Color HistogramCSCJournals
 
Visible watermarking using spread spectrum
Visible watermarking using spread spectrumVisible watermarking using spread spectrum
Visible watermarking using spread spectrumIAEME Publication
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneKalle
 

Tendances (20)

56 58
56 5856 58
56 58
 
ICPRAM 2012
ICPRAM 2012ICPRAM 2012
ICPRAM 2012
 
Implementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBImplementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTB
 
Colored Image Watermarking Technique Based on HVS using HSV Color Model
Colored Image Watermarking Technique Based on HVS using HSV Color ModelColored Image Watermarking Technique Based on HVS using HSV Color Model
Colored Image Watermarking Technique Based on HVS using HSV Color Model
 
Ch2
Ch2Ch2
Ch2
 
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
 
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye TrackingCoutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
Coutinho A Depth Compensation Method For Cross Ratio Based Eye Tracking
 
A Novel Method for Image Enhancement
A Novel Method for Image EnhancementA Novel Method for Image Enhancement
A Novel Method for Image Enhancement
 
Image processing tatorial
Image processing tatorialImage processing tatorial
Image processing tatorial
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective View
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
 
42 128-1-pb
42 128-1-pb42 128-1-pb
42 128-1-pb
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
 
Colour role in seismic interpretation
Colour role in seismic interpretationColour role in seismic interpretation
Colour role in seismic interpretation
 
Color theory and 2D graphics
Color theory and 2D graphicsColor theory and 2D graphics
Color theory and 2D graphics
 
Color Image Segmentation based on JND Color Histogram
Color Image Segmentation based on JND Color HistogramColor Image Segmentation based on JND Color Histogram
Color Image Segmentation based on JND Color Histogram
 
Ijetcas14 479
Ijetcas14 479Ijetcas14 479
Ijetcas14 479
 
Visible watermarking using spread spectrum
Visible watermarking using spread spectrumVisible watermarking using spread spectrum
Visible watermarking using spread spectrum
 
Skovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze AloneSkovsgaard Small Target Selection With Gaze Alone
Skovsgaard Small Target Selection With Gaze Alone
 
F0164348
F0164348F0164348
F0164348
 

En vedette

Guía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosGuía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosMANUEL RIVERA
 
Plan igualdad champion
Plan igualdad championPlan igualdad champion
Plan igualdad championoscargaliza
 
שיעור שתיים מעבד התמלילים
שיעור שתיים   מעבד התמליליםשיעור שתיים   מעבד התמלילים
שיעור שתיים מעבד התמליליםhaimkarel
 
Depression eng
Depression engDepression eng
Depression engJshi
 
JudCon - Aerogear Android
JudCon - Aerogear AndroidJudCon - Aerogear Android
JudCon - Aerogear AndroidDaniel Passos
 
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...oscargaliza
 
Manifesto para o día 13 de xuño
Manifesto para o día 13 de xuñoManifesto para o día 13 de xuño
Manifesto para o día 13 de xuñooscargaliza
 
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...oscargaliza
 
Ebri overview nto with narration
Ebri overview nto with narrationEbri overview nto with narration
Ebri overview nto with narrationlauriemartinalrc
 
GWAB 2015 - Data Plaraform
GWAB 2015 - Data PlaraformGWAB 2015 - Data Plaraform
GWAB 2015 - Data PlaraformMarcelo Paiva
 
JudCon Brazil 2014 - Mobile push for all platforms
JudCon Brazil 2014 - Mobile push for all platformsJudCon Brazil 2014 - Mobile push for all platforms
JudCon Brazil 2014 - Mobile push for all platformsDaniel Passos
 

En vedette (20)

Guía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentosGuía para la gestión del uso de medicamentos
Guía para la gestión del uso de medicamentos
 
Scan2008 000
Scan2008 000Scan2008 000
Scan2008 000
 
Plan igualdad champion
Plan igualdad championPlan igualdad champion
Plan igualdad champion
 
Sistema nervoso
Sistema nervosoSistema nervoso
Sistema nervoso
 
Laudo
LaudoLaudo
Laudo
 
שיעור שתיים מעבד התמלילים
שיעור שתיים   מעבד התמליליםשיעור שתיים   מעבד התמלילים
שיעור שתיים מעבד התמלילים
 
ParaEmpezarEnLaClase
ParaEmpezarEnLaClaseParaEmpezarEnLaClase
ParaEmpezarEnLaClase
 
การล่มสายยุโกสลาเวีย
การล่มสายยุโกสลาเวียการล่มสายยุโกสลาเวีย
การล่มสายยุโกสลาเวีย
 
Depression eng
Depression engDepression eng
Depression eng
 
JudCon - Aerogear Android
JudCon - Aerogear AndroidJudCon - Aerogear Android
JudCon - Aerogear Android
 
Dia abril
Dia abrilDia abril
Dia abril
 
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...
2013 05-08 pnp convenio grandes almacéns e liberalización horarios comerciais...
 
Manifesto para o día 13 de xuño
Manifesto para o día 13 de xuñoManifesto para o día 13 de xuño
Manifesto para o día 13 de xuño
 
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...
2013 05-08 pnc convenio grandes almacéns e liberalización dos horarios comerc...
 
Ebri overview nto with narration
Ebri overview nto with narrationEbri overview nto with narration
Ebri overview nto with narration
 
Presentatie lastoevoegmaterialen
Presentatie lastoevoegmaterialenPresentatie lastoevoegmaterialen
Presentatie lastoevoegmaterialen
 
TEMA 4B IR+A+Infinitive
TEMA 4B IR+A+InfinitiveTEMA 4B IR+A+Infinitive
TEMA 4B IR+A+Infinitive
 
Triangle Gives Back 101 Webinar - 2011
Triangle Gives Back 101 Webinar - 2011Triangle Gives Back 101 Webinar - 2011
Triangle Gives Back 101 Webinar - 2011
 
GWAB 2015 - Data Plaraform
GWAB 2015 - Data PlaraformGWAB 2015 - Data Plaraform
GWAB 2015 - Data Plaraform
 
JudCon Brazil 2014 - Mobile push for all platforms
JudCon Brazil 2014 - Mobile push for all platformsJudCon Brazil 2014 - Mobile push for all platforms
JudCon Brazil 2014 - Mobile push for all platforms
 

Similaire à Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization And Perceptual Learning

Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsKalle
 
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsKalle
 
Droege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesDroege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesKalle
 
Image Denoising Based On Wavelet for Satellite Imagery: A Review
Image Denoising Based On Wavelet for Satellite Imagery: A  ReviewImage Denoising Based On Wavelet for Satellite Imagery: A  Review
Image Denoising Based On Wavelet for Satellite Imagery: A ReviewIJMER
 
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...IJARIDEA Journal
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIDES Editor
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesIRJET Journal
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksKalle
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkCSCJournals
 
A New Approach for video denoising and enhancement using optical flow Estimation
A New Approach for video denoising and enhancement using optical flow EstimationA New Approach for video denoising and enhancement using optical flow Estimation
A New Approach for video denoising and enhancement using optical flow EstimationIRJET Journal
 
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...DR.P.S.JAGADEESH KUMAR
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Kalle
 
The Evils of the Rainbow Colormap
The Evils of the Rainbow ColormapThe Evils of the Rainbow Colormap
The Evils of the Rainbow ColormapPhilip Chimento
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
HON4D (O. Oreifej et al., CVPR2013)
HON4D (O. Oreifej et al., CVPR2013)HON4D (O. Oreifej et al., CVPR2013)
HON4D (O. Oreifej et al., CVPR2013)Mitsuru Nakazawa
 
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite Imagery
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite ImageryShadow Removal of Individual Tree Crowns in a High Resolution Satellite Imagery
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite ImageryIJMTST Journal
 

Similaire à Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization And Perceptual Learning (20)

FinalReport
FinalReportFinalReport
FinalReport
 
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
 
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of HeatmapsBlignaut Visual Span And Other Parameters For The Generation Of Heatmaps
Blignaut Visual Span And Other Parameters For The Generation Of Heatmaps
 
Noura1
Noura1Noura1
Noura1
 
Video saliency-detection using custom spatiotemporal fusion method
Video saliency-detection using custom spatiotemporal fusion  methodVideo saliency-detection using custom spatiotemporal fusion  method
Video saliency-detection using custom spatiotemporal fusion method
 
Droege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution ImagesDroege Pupil Center Detection In Low Resolution Images
Droege Pupil Center Detection In Low Resolution Images
 
Image Denoising Based On Wavelet for Satellite Imagery: A Review
Image Denoising Based On Wavelet for Satellite Imagery: A  ReviewImage Denoising Based On Wavelet for Satellite Imagery: A  Review
Image Denoising Based On Wavelet for Satellite Imagery: A Review
 
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...
Highly Adaptive Image Restoration In Compressive Sensing Applications Using S...
 
Incremental Difference as Feature for Lipreading
Incremental Difference as Feature for LipreadingIncremental Difference as Feature for Lipreading
Incremental Difference as Feature for Lipreading
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing Approaches
 
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural TasksRyan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
Ryan Match Moving For Area Based Analysis Of Eye Movements In Natural Tasks
 
Image Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic FrameworkImage Denoising Based On Sparse Representation In A Probabilistic Framework
Image Denoising Based On Sparse Representation In A Probabilistic Framework
 
0812 blanche ieee4
0812 blanche ieee40812 blanche ieee4
0812 blanche ieee4
 
A New Approach for video denoising and enhancement using optical flow Estimation
A New Approach for video denoising and enhancement using optical flow EstimationA New Approach for video denoising and enhancement using optical flow Estimation
A New Approach for video denoising and enhancement using optical flow Estimation
 
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...
Congenital Bucolic and Farming Region Taxonomy Using Neural Networks for Remo...
 
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
Zhang Eye Movement As An Interaction Mechanism For Relevance Feedback In A Co...
 
The Evils of the Rainbow Colormap
The Evils of the Rainbow ColormapThe Evils of the Rainbow Colormap
The Evils of the Rainbow Colormap
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
HON4D (O. Oreifej et al., CVPR2013)
HON4D (O. Oreifej et al., CVPR2013)HON4D (O. Oreifej et al., CVPR2013)
HON4D (O. Oreifej et al., CVPR2013)
 
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite Imagery
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite ImageryShadow Removal of Individual Tree Crowns in a High Resolution Satellite Imagery
Shadow Removal of Individual Tree Crowns in a High Resolution Satellite Imagery
 

Plus de Kalle

Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Kalle
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Kalle
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Kalle
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlKalle
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Kalle
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingKalle
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Kalle
 
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeStevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeKalle
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Kalle
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerKalle
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingKalle
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyKalle
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Kalle
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Kalle
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Kalle
 
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Kalle
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Kalle
 
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Kalle
 
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Kalle
 

Plus de Kalle (20)

Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
Yamamoto Development Of Eye Tracking Pen Display Based On Stereo Bright Pupil...
 
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
Wastlund What You See Is Where You Go Testing A Gaze Driven Power Wheelchair ...
 
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
Vinnikov Contingency Evaluation Of Gaze Contingent Displays For Real Time Vis...
 
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze ControlUrbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
Urbina Pies With Ey Es The Limits Of Hierarchical Pie Menus In Gaze Control
 
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
Urbina Alternatives To Single Character Entry And Dwell Time Selection On Eye...
 
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic TrainingTien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
Tien Measuring Situation Awareness Of Surgeons In Laparoscopic Training
 
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
Takemura Estimating 3 D Point Of Regard And Visualizing Gaze Trajectories Und...
 
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser OphthalmoscopeStevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
Stevenson Eye Tracking With The Adaptive Optics Scanning Laser Ophthalmoscope
 
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
Stellmach Advanced Gaze Visualizations For Three Dimensional Virtual Environm...
 
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze TrackerSan Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
San Agustin Evaluation Of A Low Cost Open Source Gaze Tracker
 
Rosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem SolvingRosengrant Gaze Scribing In Physics Problem Solving
Rosengrant Gaze Scribing In Physics Problem Solving
 
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchQvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual Search
 
Prats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement StudyPrats Interpretation Of Geometric Shapes An Eye Movement Study
Prats Interpretation Of Geometric Shapes An Eye Movement Study
 
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
Porta Ce Cursor A Contextual Eye Cursor For General Pointing In Windows Envir...
 
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
Pontillo Semanti Code Using Content Similarity And Database Driven Matching T...
 
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
Park Quantification Of Aesthetic Viewing Using Eye Tracking Technology The In...
 
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
Palinko Estimating Cognitive Load Using Remote Eye Tracking In A Driving Simu...
 
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks...
 
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
Nagamatsu User Calibration Free Gaze Tracking With Estimation Of The Horizont...
 
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
Nagamatsu Gaze Estimation Method Based On An Aspherical Model Of The Cornea S...
 

Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization And Perceptual Learning

  • 1. Space-Variant Spatio-Temporal Filtering of Video for Gaze Visualization and Perceptual Learning Michael Dorr∗ Halszka Jarodzka† Institute for Neuro- and Bioinformatics Knowledge Media Research Center University of L¨ beck u T¨ bingen u ‡ Erhardt Barth Institute for Neuro- and Bioinformatics University of L¨ beck u Abstract We introduce an algorithm for space-variant filtering of video based on a spatio-temporal Laplacian pyramid and use this algorithm to render videos in order to visualize prerecorded eye movements. Spatio-temporal contrast and colour saturation are reduced as a function of distance to the nearest gaze point of regard, i.e. non- fixated, distracting regions are filtered out, whereas fixated image regions remain unchanged. Results of an experiment in which the eye movements of an expert on instructional videos are visualized with this algorithm, so that the gaze of novices is guided to relevant image locations, show that this visualization technique facilitates the novices’ perceptual learning. CR Categories: I.4.9 [Image Processing and Computer Vi- sion]: Applications; I.4.3 [Image Processing and Computer Vision]: Enhancement—Filtering; I.4.10 [Image Processing and Computer Figure 1: Stillshot from an instructional video on classifying fish Vision]: Image Representation—Multidimensional; K.3.1 [Com- locomotion patterns. The eye movements of the expert giving voice- puters and Education]: Computer Uses in Education—Computer- over explanations are visualized by space-variant filtering on a assisted instruction (CAI) spatio-temporal Laplacian pyramid: spatio-temporal contrast and colour saturation are reduced in unattended areas. This visual- Keywords: gaze visualization, space-variant filtering, spatio- ization technique aids novices in acquiring the expert’s perceptual temporal Laplacian pyramid, perceptual learning skills. 1 Introduction Better suited for visual inspection are approaches that use the stim- Humans move their eyes around several times per second to suc- ulus and enrich it with eye movement data; in the classical paper of cessively sample visual scenes with the high-resolution centre of Yarbus [1967], gaze traces overlaid on the original images imme- the retina. The direction of gaze is tightly linked to attention, and diately show the regions that were preferentially looked at by the what people perceive ultimately depends on where they look [Stone subjects. Because of the noisy nature of both eye movements and et al. 2003]. Naturally, the ability to record eye movement data led their measurements, there is also an indirect indication of fixation to the need for meaningful visualizations. One-dimensional plots duration (traces are denser in areas of longer fixation). However, of the horizontal and vertical components of eye position over time such abstract information can also be extracted from the raw data have been in use since the very first gaze recording experiments and presented in condensed form: for example, bars of different size (Delabarre [1898] affixed a small cap on the cornea to transduce are placed in a three-dimensional view of the original stimulus to eye movements onto a rotating drum, using plaster of Paris as glue). denote fixation duration in [Lankford 2000]; in a more application- ˇ specific manner, Spakov and R¨ ih¨ [2008] annotate text with ab- a a Such plots are useful for detailed quantitative analyses, but not very intuitively interpreted. Other tools supporting interpretation of the stract information on gaze behaviour for the analysis of translation data include the visualization of gaze density by means of clustered processes. gaze samples [Heminghous and Duchowski 2006] or the visualiza- Another common method is the use of so-called fixation maps tion of other features such as fixation duration [Ramloll et al. 2004]. [Velichkovsky et al. 1996; Wooding 2002]. Here, a probability ∗ e-mail: density map is computed by the superposition of Gaussians, each dorr@inb.uni-luebeck.de centred at a single fixation (or raw gaze sample), with a subse- † e-mail: h.jarodzka@iwm-kwmrc.de ‡ e-mail: barth@inb.uni-luebeck.de quent normalization step. Areas that were fixated more often are thus assigned higher probabilities; by varying the width of the un- Copyright © 2010 by the Association for Computing Machinery, Inc. derlying Gaussians, it is possible to vary the distance up to which Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed two fixations are considered similar. Based on this probability for commercial advantage and that copies bear this notice and the full citation on the map, the stimulus images are processed so that for example lumi- first page. Copyrights for components of this work owned by others than ACM must be nance is gradually reduced in areas that received little attention; so- honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on called heat maps mark regions of interest with transparently over- servers, or to redistribute to lists, requires prior specific permission and/or a fee. ˇ Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail laid colours. In [Spakov and Miniotas 2007], the authors add “fog” permissions@acm.org. to render visible only the attended parts of the stimulus. ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 307
  • 2. For dynamic stimuli, such as movies, all the above techniques can be applied as well; one straightforward extension from images to G0 − L0 image sequences would be to apply the fixation map technique to every video frame individually. Care has to be taken, however, to appropriately filter the gaze input in order to ensure a smooth tran- sition between video frames. In this paper, we present an algorithm to visualize dynamic gaze ↓ ↑ density maps by locally modifying spatio-temporal contrast on a spatio-temporal Laplacian pyramid. In regions of low inter- est, spectral energy is reduced, i.e. edge and motion intensity are dampened, whereas regions of high interest remain as in the orig- inal stimulus. Conceptually, this algorithm is related to gaze- G1 − L1 contingent displays simulating visual fields based on Gaussian pyramids [Geisler and Perry 2002; Nikolov et al. 2004]; in these approaches, however, only fine spatial details were blurred selec- ↓ ↑ tively. In earlier work, we have already extended these algorithms to the temporal domain [B¨ hme et al. 2006], and we here com- o bine both spatial and temporal filtering. Furthermore, the work G2 L2 presented here goes beyond blurring, i.e. the specification of a sin- gle cutoff frequency per output pixel, and allows to assign indi- vidual weights to each frequency band. This means that, for ex- Figure 2: Analysis phase of a Laplacian pyramid in space. Based ample, spatio-temporal contrast can be modified while leaving de- on the Gaussian pyramid on the left side, which stores successively tails intact. However, although our implementation of this algo- smaller image versions (with higher-frequency content successively rithm achieves more than 50 frames per second rendering perfor- removed), differences of Gaussian pyramid levels are formed to ob- mance on a commodity PC, it cannot be used for real-time gaze- tain individual frequency bands (right side). To be able to form contingent applications, because there all levels of the underlying these differences, lower levels have to be upsampled before sub- pyramid need to be upsampled to full temporal resolution for ev- traction (middle). The gray bars indicate – relative to the original ery video frame. Its purpose is the off-line visualization of prere- spectrum – what frequency band is stored in each image. The ex- corded gaze patterns. A computationally much more challenging tension into the temporal domain results in lower frame rates for version of a spatio-temporal Laplacian pyramid that is suitable also the smaller video versions (not shown). for gaze-contingent displays is the topic of a forthcoming paper. Pyramid-based rendering as a function of gaze has been shown to have a guiding effect on eye movements [Dorr et al. 2008; Barth L0 + et al. 2006]. To further demonstrate the usefulness of our algorithm, we will present some results from a validation experiment in which students received instructional videos either with or without a vi- R0 = G0 sualization of the eye movements of an expert watching the same stimulus. Results show that the visualization technique presented here indeed facilitates perceptual learning and improves students’ later visual search performance on novel stimuli. L1 + ↑ R1 2 Laplacian Pyramid in Space and Space- Time L2 ↑ R2 The so-called Laplacian pyramid serves as an efficient bandpass representation of an image [Burt and Adelson 1983]. In the fol- Figure 3: Synthesis phase of a Laplacian pyramid in space. The lowing section, we will briefly review its application to images and Laplacian levels are iteratively upsampled to obtain a series of re- then extend the algorithm to the spatio-temporal domain. We will constructed images RN , RN −1 , . . . , R0 with increasing cutoff fre- here use an isotropic pyramid, i.e. all spatial and temporal dimen- quencies. If the Ln remain unchanged, R0 is an exact reproduction sions are treated equally; this results in a bandpass representation in of the original input image G0 . which e.g. low spatial and low temporal and high spatial and high temporal frequencies are represented together, respectively. For a finer-grained decomposition of the image sequence into spatio- temporal frequency bands, an anisotropic Laplacian pyramid could 2.1 Spatial Domain be used instead. Using such a decomposition, one could also ob- tain frequency bands of high spatial but low temporal frequencies The Laplacian pyramid is based on a Gaussian multiresolution etc. For a straightforward implementation, one might first create a pyramid, which stores successively smaller versions of an image; spatial pyramid for each frame of the input sequence, then decom- usually, resolution is reduced by a factor of two in each downsam- pose each level of that spatial pyramid in time (as in Section 2.2.2, pling step (for two-dimensional images, the number of pixels is thus but omitting the spatial up- and downsampling). However, the finer reduced by a factor of four). Prior to each downsampling step, the spectral resolution comes at the cost of a significantly increased image is appropriately lowpass filtered so that high-frequency con- number of pixels that need to be stored and processed; this increase tent is removed; the lower-resolution downsampling result then ful- is on the order of 1.16 · T times as many pixels for an anisotropic fills the conditions of the Nyquist theorem and can represent the pyramid with T temporal levels. (filtered) image without aliasing. For a schematic overview, we re- 308
  • 3. G0 (0) G0 (1) G0 (2) G0 (3) G0 (4) G0 (5) G0 (6) G0 (7) G0 (8) w−2 w−1 w0 w1 w2 G1 (0) G1 (1) G1 (2) G1 (3) G1 (4) w−2 w−1 w0 w1 w2 G2 (0) G2 (1) G2 (2) Figure 4: Spatio-temporal Gaussian pyramid with three levels and c = 2. Lower levels have reduced resolution both in space and in time. Note that history and lookahead video frames are required because of the temporal filter with symmetric kernel, e.g. computation of G2 (1) depends on G1 (4), which in turn depends on G0 (6). fer to Figures 2 and 3. (again, resolution is reduced by a factor of two in all dimensions with increasing k); the intermediate steps during the iterative re- 2.2 Spatio-Temporal Domain construction of the original signal are denoted as Rk (t). The temporal filtering which is required for temporal down- and The Gaussian pyramid algorithm was first applied to the temporal upsampling introduces a latency (see next sections). The number domain by Uz et al. [1991]. In analogy to dropping every other of lookahead items required on level k is denoted by λk for the pixel during the downsampling step, every other frame of an image analysis phase and by Λk for the synthesis phase. sequence is discarded to obtain lower pyramid levels (cf. Figure 4). In the temporal domain, however, the problem arises that the num- 2.2.2 Analysis Phase ber of frames to process is not necessarily known in advance and is potentially infinite; it is therefore not feasible to store the whole To compute the Laplacian levels, the Gaussian pyramid has to be image sequence in memory. Nevertheless, video frames from the created first (see Figure 2). The relationship of different Gaussian past and the future need to be accessed during the lowpass filtering levels is shown in Figure 4; lower levels are obtained by lowpass step prior to the downsampling operation; thus, care has to be taken filtering and spatially downsampling higher levels: which frames to buffer. In the following, we will refer to these , c frames as history and lookahead frames, respectively. X c X Gk+1 (n) = wi · ↓ [Gk (2n − i)] wi . Both for the subtraction of adjacent Gaussian pyramid levels (to i=−c i=−c create Laplacian levels) and for the reconstruction step (in which We here use a binomial filter kernel (1, 4, 6, 4, 1) with c = 2. the Laplacian levels are recombined), lower levels first have to be upsampled to match the resolution of the higher level. Following The Laplacian levels are then computed as differences of adjacent these upsampling steps, the results have to be filtered to interpo- Gaussian levels (the lowest level LN is the same as the lowest Gaus- late at the inserted pixels and frames; again, history and lookahead sian level GN ); before performing the subtraction, the lower level video frames are required. We will now describe these operations in has to be brought back to a matching resolution again by inserting more detail and analyse the number of video frames to be buffered. zeros (blank frames) to upsample and subsequent lowpass filtering. In practice, the inserted frames can be ignored and their correspond- 2.2.1 Notation ing filter coefficients are set to zero: x2 3 ? X „ «, X The sequence of input images is denoted by I(t); input images have ? n−i a size of W by H pixels and an arbitrary number of colour chan- Lk (n) = Gk (n)−?4 ? wi · Gk+1 wi 5 , ? i∈P (n) 2 i∈P (n) nels (individual channels are treated separately). A single pixel at location (x, y) and time t is referred to as I(t)(x, y); in the follow- with P (n) = {j = −c, . . . , c | (n − j) mod 2 = 0} giving the set ing, operations on whole images, such as addition, are to be applied of valid images on the lower level. pixelwise to all pixels. Based on these equations, we can now derive the number of looka- The individual levels of a Gaussian multiresolution pyramid with head items required for the generation of the Laplacian. For the N + 1 levels are referred to as Gk (t), 0 ≤ k ≤ N . The highest upsampling of lower Gaussian levels, we need a lookahead of level G0 is the same as the input sequence; because of the spatio- β = c+1 images on each level, with denoting floating-point 2 temporal downsampling, lower levels have fewer pixels and a lower truncation. Starting on the lowest level GN , this implies that 2β + c frame rate, so that Gk (n) has a spatial resolution of W/2k by H/2k images must be available on level GN −1 during the downsam- pixels and corresponds to the same point in time as G0 (2k n). Spa- pling phase; we can repeatedly follow this argument and obtain tial up- and downsampling operations on an image I are denoted λk = 2N −k · (β + c) − c as the number of required lookahead as ↑ [I] and ↓ [I], respectively. For time steps t that are not a images for level k. multiple of 2N , not all pyramid levels have a corresponding image Gk (t/2k ). Therefore, Ct denotes the highest index of levels with 2.2.3 Synthesis Phase valid images at time t, i.e. Ct is the largest integer with Ct ≤ N and t mod 2Ct = 0. Similar to the Gaussian levels Gk , we re- Turning now to the synthesis phase of the Laplacian pyramid, we fer to the levels of the Laplacian pyramid as Lk (t), 0 ≤ k ≤ N note from Figure 3 that the Laplacian levels are sucessively upsam- 309
  • 4. R0 (t − 1) R0 (t) L0 (t + 1) L0 (t + Λ0 ) · w−2 w0 w2 R1 ( t+c ) 2 L1 (t/2 + Λ1 ) w−1 · w1 L2 ( t/4 + Λ2 ) Figure 5: Synthesis step of spatio-temporal Laplacian pyramid. Here shown are both the Laplacian levels Li and the (partially) reconstructed images Ri , which are based on lower levels with indices ≥ i; in practice, the same buffers can be used for both R and L. For example, to compute the reconstruction R0 (t) of the original image, we have to add L0 (t) to a spatio-temporally upsampled version of R1 . The second level L1 is combined with the upsampling result of L2 in R1 ( t+c ) = R1 ( 2 + β1 ) (see pseudocode). In this schematic overview, new frames 2 t are added on the right side and shifted leftwards with time. pled and added up to reconstruct the original image; this simply Gaussian pyramid in memory twice, once in the “correct” size and is the inverse of the “upsample–and–subtract” operation during the once in the downsampled version; for each frame of the input video, analysis phase. On the lowest level, RN (n) = LN (n); for higher only one downsampling operation has to be executed then. In anal- levels, the intermediate reconstructed images are computed as ogy to the Gaussian levels, both the Laplacian and the (partially) x 3 reconstructed levels L and R can be held together in one buffer per X ?» „ «, X level k with Λk lookahead, one current image, and the βk history. ? n−i Rk (n) = Lk (n) + wi ? Rk+1 ? wi 5. ? 2 i∈P (n) i∈P (n) (1) In practice, the output of the pyramid can be accessed only with Clearly, a further latency is incurred between the point in time for a certain latency because of the symmetric temporal filters that re- which bandpass information and the reconstructed or filtered image quire video frames from the future. Input images are fed into looka- are available. Similar to the study of the analysis phase in the pre- head position λ0 of buffer G0 , and images are shifted towards the vious section, we can compute the number Λk of required looka- “current” position by one position for every new video frame. This head items on each level by induction. On the lowest level LN , means that only λ0 many time steps after video frame I(t0 ) has again β = c+1 images are required for the upsampling opera- been added, the Gaussian images G0 to GN that represent I(t0 ) 2 tion, which corresponds to 2β images on level N − 1. As can be at various spatio-temporal resolutions are available in the “current” seen in Figure 5, the result of the upsampling operation is added positions of the Gaussian buffers. The resulting differences L0 to to the β-th lookahead item on level N − 1, so that ΛN −1 = 3β. LN then are stored at the lookahead positions Λ0 to ΛN of the Repeating this computation, we obtain Λk = (2N +1−k − 1) · β; for Laplacian buffers, respectively; here, different frequency bands can L0 , however, no further upsampling is required, so it is possible to be accessed both for analysis and modification. Only Λ0 time steps reduce the lookahead on the highest level to Λ0 = (2N +1 − 2) · β. later does the input image I reappear after pyramid synthesis; over- all, this leads to a pyramid latency between input and output of In practice, we do not need to differentiate explicitly between L and λ0 + Λ0 time steps. R; the same buffers can be used for both L and R images. Care has to be taken then not to perform a desired modification of a given Laplacian level on a buffer that already contains information from The necessary buffering and the handling of lookahead frames lower levels as well (i.e. an R image). could be reduced and simplified if causal filters were used; a fur- ther possibility to efficiently filter in time without lookahead is to 2.3 Pseudocode and Implementation use temporally recursive filters. However, any non-symmetry in the filters will introduce phase shifts. Particularly in the case of We are now ready to bring together the above observations and space-variant filtering (see below), this would produce image arte- put them into pseudocode, see Algorithms 1 and 2. Based on facts (such as a pedestrian with disconnected – fast – legs and – P (n) above, the index function that determines which images relatively slow – upper body). are available` on lower levels in the following is Pk (t) = {j = ´ −c, . . . , c | 2tk + βk − j mod 2 = 0}. In the synthesis phase, the image offset βk at which the recombination L and R takes place 3 Gaze Visualization can be set to zero on the highest level; we therefore use βk = β for k > 0, β0 = 0. From the pseudocode, a buffering scheme for the implementation In the previous section, we described the analysis and synthesis directly follows. First, images from the Gaussian pyramid have to phase of a spatio-temporal Laplacian pyramid. However, the re- be stored; each level k needs at least λk lookahead images, one sult of the synthesis phase is a mere reconstruction of the original current image, and βk history. Trading memory requirements for image sequence; we want to filter the image sequence based on a computational costs, it is also possible to keep all images of the list of gaze positions instead. 310
  • 5. Algorithm 1 Pseudocode for one time step of the pyramid analysis Algorithm 2 Pseudocode for one time step of the pyramid synthesis phase. phase. Input: t Time step to update the pyramid for Input: t Time step to update the pyramid for G0 , . . . , GN Levels of the Gaussian pyramid G0 , . . . , GN Levels of the Gaussian pyramid L0 , . . . , L N Levels of the Laplacian pyramid L0 , . . . , L N Levels of the Laplacian pyramid Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0}) Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0}) Gaussian pyramid creation for k = Ct , . . . , 0 do G0 (t + λ0 ) = I(t + λ0 ) if k = N then for k = 1, . . . , Ct do „ « „ « t t „ « RN + βN = L N + βN t 2N 2N Gk + λk = 2k else „ « „ « Xc ?» „ «– , X c t t ? t Rk + βk = Lk + βk + wi · y ? Gk−1 + 2λk − i wi 2k 2k i=−c 2k−1 i=−c x2 3 end for ? „ « ? X t βk − i 5 Laplacian pyramid creation ?4 wi · Rk+1 + ? 2k+1 for k = 0, . . . , Ct do ? i∈Pk (t) 2 if k = N then „ « „ « end if t t LN + ΛN = GN + ΛN end for 2N 2N else „ « „ « very strongly lowpass-filtered video version; this means that some t t coarse spatio-temporal structure remains even in regions where all Lk + Λk = Gk + Λk − 2k 2k contrast in higher levels is removed by setting the coefficient map x2 3 to zero. The temporal multiresolution character of the pyramid also ? X „ « , X adds smoothness to changes in the coefficient maps over time; be- ? t Λk − i 5 ?4 wi · Gk+1 + wi cause temporal levels are updated at varying rates, such changes are ? 2k+1 2 ? i∈Pk (t) i∈Pk (t) introduced gradually. Finally, by using different coefficient maps for each level, it is trivially possible to highlight certain frequency end if bands, which is impossible based on a computation of the mean end for alone. 4 Attentional Guidance 3.1 Space-Variant Pyramid Synthesis Pyramid-based rendering of video as a function of gaze has been We now introduce the concept of a coefficient map that indicates shown to have a guiding effect on eye movements. For example, the how spectral energy should be modified in each frequency band at introduction of peripheral temporal blur on a gaze-contingent dis- each pixel of the output image sequence. We denote the coefficient play reduces the number of large-amplitude saccades [Barth et al. map for level k at time t with Wk (t); the Wk have the same spatial 2006], even though the visibility of such blur is low [Dorr et al. resolution as the corresponding Lk , i.e. W/2k by H/2k pixels. 2005]. Using a real-time gaze-contingent version of a spatial Lapla- cian pyramid, locally reducing (spatial) spectral energy at likely fix- To bandpass-filter the image sequence, the Laplacian levels Lk are ation points also changes eye movement characteristics [Dorr et al. simply multiplied pixel-wise with the Wk prior to the recombina- 2008]. tion into Rk . In the following, we will therefore briefly summarize how our gaze Based on the pseudocode (Algorithm 3), we can see that coefficient visualization algorithm can be applied in a learning task to guide maps for different points in time are applied to the different levels the student’s gaze. For further details of this experiment, we refer in each synthesis step of the pyramid; this follows from the itera- to [Jarodzka et al. 2010b]. tive recombination of L into the reconstructed levels. In practice, a more straightforward solution is to apply coefficient maps corre- sponding to one time t to the farthest lookahead item Λk of each 4.1 Perceptual Learning level Lk (i.e. right after subtraction of adjacent Gaussian levels). In many problem domains, experts develop efficient eye movement As noted before, in the following validation experiment we will use strategies because the underlying problem requires substantial vi- the same coefficient map for all levels (for computational efficiency, sual search. Examples include the analysis of radiograms [Lesgold however, coefficient maps for lower levels can be stored with fewer et al. 1988], driving [Underwood et al. 2003], and the classifica- pixels). In principle, this means that a similar effect could be tion of fish locomotion [Jarodzka et al. 2010a]. In order to aid achieved by computing the mean pixel intensity of the whole image novices in acquiring the efficient eye movement strategies of an ex- sequence and then, depending on gaze position, smoothly blend- pert, it is possible to use cueing to guide their attention towards rel- ing between this mean value and each video pixel. However, for evant stimulus locations; however, it often remains unclear where practical reasons, the lowest level of the pyramid does not repre- and how to cue the user. Van Gog et al. [2009] guided attention sent the “true” DC (the mean of the image sequence), but merely a during problem-solving tasks by directly displaying the eye move- 311
  • 6. 1.0 Algorithm 3 Pseudocode for one time step of the space-variant syn- thesis phase. Input: t Time step to update the pyramid for 0.8 G0 , . . . , GN Levels of the Gaussian pyramid L0 , . . . , L N Levels of the Laplacian pyramid Relative contrast 0.6 W0 , . . . , WN Coefficient maps Ct = max({γ ∈ N | 0 ≤ γ ≤ N , t mod 2γ = 0}) 0.4 for k = Ct , . . . , 0 do if k = N then 0.2 „ « „ « t t RN + βN = L N + βN 0.0 2N 2N −300 −200 −100 0 100 200 300 else „ « „ « „ « Eccentricity [pix] t t t Rk + βk = Wk x, y, k + βk ·Lk x, y, k + βk + Figure 6: Eccentricity-dependent resolution map: at centre of fix- 2k 2 2 ation, spectral energy remains the same; energy is increasingly re- duced with increasing distance from gaze position. x2 3 ? „ « ? X t βk − i 5 ?4 wi · Rk+1 + ? 2k+1 2 ? i∈Pk (t) 4.3 Gaze Filtering end if Functionally, a sequence of eye movements consists of a series of end for fixations, where eye position remains constant, and saccades, dur- ing which eye position changes rapidly (smooth pursuit movements here can be understood as fixations where gaze position remains constant on a moving object). In practice, however, the eye posi- ments of an expert made during performing the same task on mod- tion as measured by the eye tracker hardly ever stays constant from eling examples, but found that the attentional guidance actually de- one sample to the next; the fixational instability of the oculomotor creased novices’ subsequent test performance instead of facilitating system, minor head movements, and noise in the camera system of the learning process. One possible explanation of this effect could the eye tracker all contribute to the effect that the measured eye po- be that the chosen method of guidance (a red dot at the experts’ sition exhibits a substantial jitter. If this jitter were to be replayed to gaze position that grew in size with fixation duration) was not opti- the novice, such constant erratic motion might distract the observer mal because the gaze marker covered exactly those visual features it from the very scene that gaze guidance is supposed to highlight. In was supposed to highlight, and its dynamical nature might have dis- order to reduce the jitter, raw gaze data was filtered with a tempo- tracted the observers. To avoid this problem, we here use the space- ral Gaussian lowpass filter with a support of 200 ms and a standard variant filtering algorithm presented in the previous sections to ren- deviation of 42 ms. der instructional videos such that the viewer’s attention is guided to those areas that were attended by the expert. However, instead of altering these attended areas, we decrease spatio-temporal con- 4.4 Space-Variant Filtering and Colour Removal trast (i.e. edge and motion intensity) elsewhere, in order to increase the relative visual saliency of the problem-relevant areas without A Laplacian pyramid with five levels was used; coefficient maps covering them or introducing artefacts. were created in such a way that the original image sequence was reconstructed faithfully in the fixated area (the weight of all lev- els during pyramid synthesis was set to 1.0) and spatio-temporal 4.2 Stimulus Material and Experimental Setup changes were diminished (all level weights set to 0.0) in those ar- eas that the expert had only seen peripherally. On the highest level, Eight videos of different fish species with a duration of 4 s each the first zone was defined by a radius of 32 pixels around gaze po- were recorded, depicting different locomotion patterns. They had sition and weights were set to 0.0 outside a radius of 256 pixels; a spatial resolution of 720 by 576 pixels and a frame rate of 25 these radii approximately corresponded to 1.15 and 9.2 degrees of frames per second. Four of these videos were shown in a contin- visual angle, respectively. In parafoveal vision, weights were grad- uous loop to an expert on fish locomotion (a professor of marine ually decreased from 1.0 to 0.0 for a smooth transition, following zoology) and his eye movements were collected using a Tobii 1750 a Gaussian falloff with a standard deviation of 40 pixels (see Fig- remote eye tracker running at 50 Hz. Simultaneously, a spoken di- ure 6). Furthermore, these maps were produced not only by plac- dactical explanation of the locomotion pattern (i.e. how different ing a mask at the current gaze position in each video frame; in- body parts moved) was recorded. These four videos were shown to stead, masks for all gaze positions of the preceding and following 72 subjects (university students without prior task experience) in a 300 ms were superimposed and the coefficient map was then nor- training phase either as-is, with the expert’s eye movements marked malized to a maximum of 1.0. During periods of fixation, this su- by a simple yellow disk at gaze position, or with attentional guid- perposition had little or no effect; during saccades, however, this ance by the pyramid-based contrast reduction. In the subsequent procedure elongated the radially symmetric coefficient map along test or recall phase, the remaining four videos were shown to the the direction of the saccade. Thus, the observer was able to fol- subjects without any modification. After presentation, subjects had low the expert’s saccades and unpredictable large displacements of to apply the knowledge acquired during the training phase and had the unmodified area were prevented. Finally, colour saturation was to name and describe the locomotion pattern displayed in each test also removed from non-attended areas similar to the reduction of video; the number of correct answers yielded a performance score. spectral energy; here, complete removal of colour started outside 312
  • 7. a radius of 384 pixels around gaze, and the Gaussian falloff in the for comments on a first draft of this manuscript. transition area had a standard deviation of 67 pixels. Note that these parameters were determined rather informally to find a reasonable trade-off between a focus that would be too restricted (if the fo- References cus were only a few pixels wide, discrimination of relevant features would be impossible) and a wide focus that would be without effect ¨ BARTH , E., D ORR , M., B OHME , M., G EGENFURTNER , K. R., (if the unmodified area encompassed the whole stimulus). As such, AND M ARTINETZ , T. 2006. Guiding the mind’s eye: improv- these parameters are likely to be specific to the stimulus material ing communication and vision by external control of the scan- used here. For a thorough investigation of the visibility of peripher- path. In Human Vision and Electronic Imaging, B. E. Rogowitz, ally removed colour saturation using a gaze-contingent display, we T. N. Pappas, and S. J. Daly, Eds., vol. 6057 of Proc. SPIE. In- refer to Duchowski et al. [2009]. vited contribution for a special session on Eye Movements, Vi- sual Search, and Attention: a Tribute to Larry Stark. For an example frame, see Figure 1; a demo video is available on- line at http://www.gazecom.eu/demo-material. ¨ B OHME , M., D ORR , M., M ARTINETZ , T., AND BARTH , E. 2006. Gaze-contingent temporal filtering of video. In Proceedings of 4.5 Results Eye Tracking Research & Applications (ETRA), 109–115. B URT, P. J., AND A DELSON , E. H. 1983. The Laplacian pyramid Previous research has already shown that providing a gaze marker as a compact image code. IEEE Transactions on Communica- in the highly perceptual task of classifying fish locomotion facili- tions 31, 4, 532–540. tates perceptual learning: subjects look at relevant movie regions for a longer time and take less time to find relevant locations after D ELABARRE , E. B. 1898. A method of recording eye-movements. stimulus onset, which in turn results in higher performance scores American Journal of Psychology 9, 4, 572–574. in subsequent tests on novel stimuli [Jarodzka et al. 2009]. The gaze visualization technique presented here does not cover these relevant ¨ D ORR , M., B OHME , M., M ARTINETZ , T., AND BARTH , E. 2005. locations; subjects’ visual search performance is improved even be- Visibility of temporal blur on a gaze-contingent display. In yond that obtained with the simple gaze marker. Time needed to APGV 2005 ACM SIGGRAPH Symposium on Applied Percep- find relevant locations after stimulus onset decreases by 21.26% tion in Graphics and Visualization, 33–36. compared to the gaze marker condition and by 27.53% compared to the condition without any guidance. Moreover, dwell time on the D ORR , M., V IG , E., G EGENFURTNER , K. R., M ARTINETZ , T., AND BARTH , E. 2008. Eye movement modelling and gaze guid- relevant locations increases by 7.25% compared to the gaze marker condition and by 48.82% compared to the condition without any ance. In Fourth International Workshop on Human-Computer guidance. For a more in-depth analysis see [Jarodzka et al. 2010b]. Conversation. D UCHOWSKI , A. T., BATE , D., S TRINGFELLOW, P., T HAKUR , 5 Conclusion K., M ELLOY, B. J., AND G RAMOPADHYE , A. K. 2009. On spatiochromatic visual sensitivity and peripheral color LOD We have presented a novel algorithm to perform space-variant fil- management. ACM Transactions on Applied Perception 6, 2, 1– tering of a movie based on a spatio-temporal Laplacian pyramid. 18. One application is the visualization of eye movements on videos; spatio-temporal contrast is modified as a function of gaze density, G EISLER , W. S., AND P ERRY, J. S. 2002. Real-time simulation i.e. spectral energy is reduced in regions of low interest. In a val- of arbitrary visual fields. In ETRA ’02: Proceedings of the 2002 idation experiment, subjects watched instructional videos on fish symposium on Eye tracking research & applications. ACM, New locomotion either with or without visualization of the eye move- York, NY, USA, 83–87. ments of an expert. We were able to show that on novel test stimuli, H EMINGHOUS , J., AND D UCHOWSKI , A. T. 2006. iComp: a tool subjects who had received such information performed better than for scanpath visualization and comparison. In APGV ’06: Pro- subjects who had not benefited from the expert’s eye movements ceedings of the 3rd symposium on Applied perception in graphics during training, and that the gaze visualization technique presented and visualization, ACM, New York, NY, USA, 152–152. here facilitated learning better than a simple gaze display (yellow gaze marker). In principle, any visualization technique that reduces JARODZKA , H., S CHEITER , K., G ERJETS , P., VAN G OG , T., AND the relative visibility of those regions not attended by the expert D ORR , M. 2009. How to convey perceptual skills by displaying might have a similar effect; our choice for this particular technique experts’ gaze data. In Proceedings of the 31st Annual Conference was motivated by our work on eye movement prediction [Dorr et al. of the Cognitive Science Society, Austin, TX: Cognitive Science 2008; Vig et al. 2009], which shows that spectral energy is a good Society, N. A. Taatgen and H. van Rijn, Eds., 2920–2925. predictor for eye movements. Ultimately, we intend to use similar techniques in a gaze-contingent fashion in order to guide the gaze JARODZKA , H., S CHEITER , K., G ERJETS , P., AND VAN G OG , of an observer [Barth et al. 2006]. T. 2010. In the eyes of the beholder: How experts and novices interpret dynamic stimuli. Journal of Learning and Instruction 20, 2, 146–154. Acknowledgements JARODZKA , H., VAN G OG , T., D ORR , M., S CHEITER , K., AND Our research has received funding from the European Commission G ERJETS , P. 2010. Guiding attention guides thought, but what within the project GazeCom (contract no. IST-C-033816) of the about learning? Eye movements in modeling examples. (sub- 6th Framework Programme. All views expressed herein are those mitted). of the authors alone; the European Community is not liable for any use made of the information. Halszka Jarodzka was supported L ANKFORD , C. 2000. Gazetracker: software designed to facilitate by the Leibniz-Gemeinschaft within the project “Resource-adaptive eye movement analysis. In ETRA ’00: Proceedings of the 2000 design of visualizations for supporting the comprehension of com- symposium on Eye tracking research & applications, ACM, New plex dynamics in the Natural Sciences”. We thank Martin B¨ hme o York, NY, USA, 51–55. 313
  • 8. L ESGOLD , A., RUBINSON , H., F ELTOVICH , P., G LASER , R., K LOPFER , D., AND WANG , Y. 1988. Expertise in a com- plex skill: diagnosing X-ray pictures. In The nature of exper- tise, M. T. H. Chi, R. Glaser, and M. Farr, Eds. Hillsdale, NJ: Erlbaum, 311–342. N IKOLOV, S. G., N EWMAN , T. D., B ULL , D. R., C ANAGARA - JAH , C. N., J ONES , M. G., AND G ILCHRIST, I. D. 2004. Gaze-contingent display using texture mapping and OpenGL: system and applications. In Eye Tracking Research & Appli- cations (ETRA), 11–18. R AMLOLL , R., T REPAGNIER , C., S EBRECHTS , M., AND B EEDASY, J. 2004. Gaze data visualization tools: Opportu- nities and challenges. In IV ’04: Proceedings of the Information Visualisation, Eighth International Conference, IEEE Computer Society, Washington, DC, USA, 173–180. S TONE , L. S., M ILES , F. A., AND BANKS , M. S. 2003. Linking eye movements and perception. Journal of Vision 3, 11 (11), i–iii. U NDERWOOD , G., C HAPMAN , P., B ROCKLEHURST, N., U NDER - WOOD , J., AND C RUNDALL , D. 2003. Visual attention while driving: Sequences of eye fixations made by experienced and novice drivers. Ergonomics 46, 629–646. U Z , K. M., V ETTERLI , M., AND L E G ALL , D. J. 1991. Interpola- tive multiresolution coding of advanced television with compati- ble subchannels. IEEE Transactions on Circuits and Systems for Video Technology 1, 1, 86–99. VAN G OG , T., JARODZKA , H., S CHEITER , K., G ERJETS , P., AND PAAS , F. 2009. Effects of attention guidance during example study by showing students the models’ eye movements. Com- puters in Human Behavior 25, 785–791. V ELICHKOVSKY, B., P OMPLUN , M., AND R IESER , J. 1996. Attention and communication: Eye-movement-based research paradigms. In Visual Attention and Cognition, W. H. Zangemeis- ter, H. S. Stiehl, and C. Freksa, Eds. Amsterdam, Netherlands: Elsevier Science, 125–54. V IG , E., D ORR , M., AND BARTH , E. 2009. Efficient visual cod- ing and the predictability of eye movements on natural movies. Spatial Vision 22, 5, 397–408. ˇ S PAKOV, O., AND M INIOTAS , D. 2007. Visualization of eye gaze data using heat maps. Electronica & Electrical Engineering 2, 55–58. ˇ ¨ ¨ S PAKOV, O., AND R AIH A , K.-J. 2008. KiEV: A tool for visual- ization of reading and writing processes in translation of text. In ETRA ’08: Proceedings of the 2008 symposium on Eye tracking research & applications, ACM, New York, NY, USA, 107–110. W OODING , D. S. 2002. Fixation maps: Quantifying eye- movement traces. In ETRA ’02: Proceedings of the 2002 sympo- sium on Eye tracking research & applications, ACM, New York, NY, USA, 31–36. YARBUS , A. L. 1967. Eye Movements and Vision. Plenum Press, New York. 314