Gen AI in Business - Global Trends Report 2024.pdf
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags
1. Identifying Objects in
Images from Analyzing the
User„s Gaze Movements
for Provided Tags
Tina Walber, Ansgar Scherp, Steffen Staab
University of Koblenz-Landau, Koblenz, Germany
Multimedia Modeling Conference
Klagenfurt, Austria
January 4-6, 2012
2. Motivation: Image Tagging
tree
girl
car
store
people
sidewalk
Find specific objects in images
Analyzing the user‟s gaze path only
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21
3. Research Questions
1.Best fixation measure to find the correct
image region given a specific tag?
2. Can we differentiate two regions in the
same image?
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21
4. 3 Steps Conducted by Users
Look at red blinking dot
Decide whether tag can be seen (“y” or “n”)
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21
5. Dataset
LabelMe community images
Manually drawn polygons
Regions annotated with tags
182.657 images (August 2010)
High-quality segmentation and annotation
Used as ground truth
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21
6. Experiment Images and Tags
Randomly selected 51 images
Contain at least two tagged regions
Created two tag sets for the 51 images
Each image is assigned two tags (one per set)
Tags are either “true” or “false”
“true” object described by tag can be seen
“false” object cannot be seen on the image
Keep subjects concentrated during experiment
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21
7. Subjects & Experiment System
20 subjects
16 male, 4 female (age: 23-40, Ø=29.6)
Undergrads (6), PhD (12), office clerks (2)
Experiment system
Simple web page in Internet Explorer
Standard notebook, resolution 1680x1050
Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21
8. Conducting the Experiment
Each user looked at 51 tag-image-pairs
First tag-image-pair dismissed
94.3% correct answers
Equal for true/false tags
~3s until decision (average)
85% of users strongly agreed or agreed that
they felt comfortable during the experiment
Eyetracker did not much influence comfort
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21
9. Pre-processing of Eye-tracking Data
Obtained 547 gaze paths from 20 users where
Users gave correct answers
Image has “true” tag assigned
Fixation extraction
Tobii Studio‟s velocity & distance thresholds
Fixation: focus on particular point on screen
One fixation inside or near the correct region
476 (87%) gaze paths fulfill this requirement
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21
10. Analysis of Gaze Fixations (1)
Applied 13 fixation measures on the 476 paths
(2 new, 7 standard Tobii , 4 literature)
Fixation measure: function on users‟ gaze paths
Calculated for each image region, over all users
viewing the same tag-image-pair
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21
11. Considered Fixation Measures
Nr Name Favorite region r Origin
1 firstFixation No. of fixations before 1st on r Tobii
2 secondFixation No. of fixations before 2nd on r [13]
3 fixationsAfter No. of fixations after last on r [4]
4 fixationsBeforeDecision fixationsAfter, but before decision New
5 fixationsAfterDecision fixationsBeforeDecision and after New
6 fixationDuration Total duration of all fixations on r Tobii
7 firstFixationDuration Duration of first fixation on r Tobii
8 lastFixationDuration Duration of last fixation on r [11]
9 fixationCount Number of fixations on r Tobii
10 maxVisitDuration Max time first fixation until outside r Tobii
11 meanVisitDuration Mean time first fixation until outside r Tobii
12 visitCount No. of fixations until outside r Tobii
13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r
Walber, A. Scherp, Saccade [6]of 21
11
12. Analysis of Gaze Fixations (2)
For every image region (b) the fixation
measure is calculated over all gaze paths (c)
Results are summed up per region
Regions ordered according to fixation measure
If favorite region (d) and tag (a) match, result is
true positive (tp), otherwise false positive (fp)
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21
13. Precision per Fixation Measure
meanVisitDuration P
Sum of tp and fp assignments
fixationsBeforeDecision lastFixationDuration
fixationDuration
Fixation measures
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21
14. Adding Boundaries and Weights
Take eye-tracker inaccuracies into account
Extension of region boundaries by 13 pixels
Larger regions more likely to be fixated
Give weight to regions < 5% of image size
meanVisitDuration increases to P = 0.67
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21
16. Comparison with Baselines
Naïve baseline: largest region r is favorite
Random baseline: randomly select favorite r
Gaze / Gaze* significantly better (χ², α<0.001)
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21
17. Effect of Gaze Path Aggregation
P
Number of gaze paths used
Aggregation of precision P for Gaze*
Single user still significantly better (χ² for
naive with α<0.001 and random with α<0.002)
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21
18. Research Questions
1.Best fixation measure to find the correct
image region given a specific tag?
meanVisitDuration with precision of 67%
2. Can we differentiate two regions in the
same image?
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21
19. Differentiate Two Objects
Use second tag set to identify different objects
in the same image
16 images (of our 51) have two “true” tags
6 images had two correct regions identified
Proportion of 38%
Average precision for single object is 67%
Correct tag assignment for two images: 44%
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21
21. Research Questions
1.Best fixation measure to find the correct
image region given a specific tag?
meanVisitDuration with precision of 67%
2. Can we differentiate two regions in the
same image?
Accuracy of 38%
Acknowledgement: This research was partially supported by the EU projects
Petamedia (FP7-216444) andObjects in Images
T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
22. Influence of Red Dot
First 5 fixations, over all subjects and all images
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21
23. Experiment Data Cleaning
Manually replaced images with
a) Tags that are incomprehensible, require
expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
the image and thus were “true” tags
T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21