SlideShare a Scribd company logo
1 of 23
Identifying Objects in
Images from Analyzing the
User„s Gaze Movements
for Provided Tags
Tina Walber, Ansgar Scherp, Steffen Staab
University of Koblenz-Landau, Koblenz, Germany

Multimedia Modeling Conference
Klagenfurt, Austria
January 4-6, 2012
Motivation: Image Tagging
                      tree

                                                                  girl
       car

                                                                                  store

                                                                         people
       sidewalk
     Find specific objects in images
     Analyzing the user‟s gaze path only
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                     2 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?



2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   3 of 21
3 Steps Conducted by Users




 Look at red blinking dot
 Decide whether tag can be seen (“y” or “n”)
 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   4 of 21
Dataset
 LabelMe community images
   Manually drawn polygons
   Regions annotated with tags
 182.657 images (August 2010)



 High-quality segmentation and annotation
 Used as ground truth

 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   5 of 21
Experiment Images and Tags
 Randomly selected 51 images
 Contain at least two tagged regions

 Created two tag sets for the 51 images
 Each image is assigned two tags (one per set)

 Tags are either “true” or “false”
   “true”  object described by tag can be seen
   “false”  object cannot be seen on the image
 Keep subjects concentrated during experiment
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   6 of 21
Subjects & Experiment System
 20 subjects
   16 male, 4 female (age: 23-40, Ø=29.6)
   Undergrads (6), PhD (12), office clerks (2)


 Experiment system
    Simple web page in Internet Explorer
    Standard notebook, resolution 1680x1050
    Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   7 of 21
Conducting the Experiment
 Each user looked at 51 tag-image-pairs
 First tag-image-pair dismissed

 94.3% correct answers
 Equal for true/false tags
 ~3s until decision (average)

 85% of users strongly agreed or agreed that
  they felt comfortable during the experiment
   Eyetracker did not much influence comfort
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   8 of 21
Pre-processing of Eye-tracking Data
 Obtained 547 gaze paths from 20 users where
   Users gave correct answers
   Image has “true” tag assigned
 Fixation extraction
   Tobii Studio‟s velocity & distance thresholds
   Fixation: focus on particular point on screen

 One fixation inside or near the correct region
 476 (87%) gaze paths fulfill this requirement

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   9 of 21
Analysis of Gaze Fixations (1)
 Applied 13 fixation measures on the 476 paths
  (2 new, 7 standard Tobii , 4 literature)

 Fixation measure: function on users‟ gaze paths
 Calculated for each image region, over all users
  viewing the same tag-image-pair




  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   10 of 21
Considered Fixation Measures
Nr Name                             Favorite region r                                   Origin
1    firstFixation                  No. of fixations before 1st on r                    Tobii
2    secondFixation                 No. of fixations before 2nd on r                    [13]
3    fixationsAfter                 No. of fixations after last on r                    [4]
4    fixationsBeforeDecision fixationsAfter, but before decision                        New
5    fixationsAfterDecision         fixationsBeforeDecision and after                   New
6    fixationDuration               Total duration of all fixations on r                Tobii
7    firstFixationDuration          Duration of first fixation on r                     Tobii
8    lastFixationDuration           Duration of last fixation on r                      [11]
9    fixationCount                  Number of fixations on r                            Tobii
10 maxVisitDuration                 Max time first fixation until outside r             Tobii
11 meanVisitDuration                Mean time first fixation until outside r Tobii
12 visitCount                       No. of fixations until outside r                    Tobii
13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r
      Walber, A. Scherp,                Saccade                                         [6]of 21
                                                                                         11
Analysis of Gaze Fixations (2)




 For every image region (b) the fixation
  measure is calculated over all gaze paths (c)
 Results are summed up per region
 Regions ordered according to fixation measure
 If favorite region (d) and tag (a) match, result is
  true positive (tp), otherwise false positive (fp)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   12 of 21
Precision per Fixation Measure
                                                                          meanVisitDuration                               P
Sum of tp and fp assignments




            fixationsBeforeDecision                                                             lastFixationDuration


                                                                                      fixationDuration



                                                                       Fixation measures
                               T. Walber, A. Scherp, S. Staab – Identifying Objects in Images                  13 of 21
Adding Boundaries and Weights
 Take eye-tracker inaccuracies into account
 Extension of region boundaries by 13 pixels




 Larger regions more likely to be fixated
 Give weight to regions < 5% of image size
 meanVisitDuration increases to P = 0.67
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   14 of 21
Examples: Tag-Region-Assignments




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   15 of 21
Comparison with Baselines




 Naïve baseline: largest region r is favorite
 Random baseline: randomly select favorite r

 Gaze / Gaze* significantly better (χ², α<0.001)

  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   16 of 21
Effect of Gaze Path Aggregation
         P




                                    Number of gaze paths used

 Aggregation of precision P for Gaze*

 Single user still significantly better (χ² for
  naive with α<0.001 and random with α<0.002)
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   17 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
   meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   18 of 21
Differentiate Two Objects
 Use second tag set to identify different objects
  in the same image
 16 images (of our 51) have two “true” tags
 6 images had two correct regions identified
   Proportion of 38%

 Average precision for single object is 67%
  Correct tag assignment for two images: 44%


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   19 of 21
Correctly Differentiated Objects




 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   20 of 21
Research Questions


1.Best fixation measure to find the correct
  image region given a specific tag?
    meanVisitDuration with precision of 67%


2. Can we differentiate two regions in the
   same image?
   Accuracy of 38%
Acknowledgement: This research was partially supported by the EU projects
Petamedia (FP7-216444) andObjects in Images
   T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
Influence of Red Dot




 First 5 fixations, over all subjects and all images
  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   22 of 21
Experiment Data Cleaning
 Manually replaced images with
a) Tags that are incomprehensible, require
   expert-knowledge, or nonsense
b) Tag refers to multiple regions, but not all are
   drawn into the image (e.g., bicycle)
c) Obstructed objects (bicycle behind a car)
d) “False”-tag actually refers to a visible part of
   the image and thus were “true” tags


  T. Walber, A. Scherp, S. Staab – Identifying Objects in Images   23 of 21

More Related Content

More from Ansgar Scherp

STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...Ansgar Scherp
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Ansgar Scherp
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresAnsgar Scherp
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Ansgar Scherp
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebAnsgar Scherp
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestAnsgar Scherp
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...Ansgar Scherp
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...Ansgar Scherp
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Ansgar Scherp
 

More from Ansgar Scherp (18)

STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the Web
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interest
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Linked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triplesLinked open data - how to juggle with more than a billion triples
Linked open data - how to juggle with more than a billion triples
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

  • 1. Identifying Objects in Images from Analyzing the User„s Gaze Movements for Provided Tags Tina Walber, Ansgar Scherp, Steffen Staab University of Koblenz-Landau, Koblenz, Germany Multimedia Modeling Conference Klagenfurt, Austria January 4-6, 2012
  • 2. Motivation: Image Tagging tree girl car store people sidewalk  Find specific objects in images  Analyzing the user‟s gaze path only T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21
  • 3. Research Questions 1.Best fixation measure to find the correct image region given a specific tag? 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21
  • 4. 3 Steps Conducted by Users  Look at red blinking dot  Decide whether tag can be seen (“y” or “n”) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21
  • 5. Dataset  LabelMe community images  Manually drawn polygons  Regions annotated with tags  182.657 images (August 2010)  High-quality segmentation and annotation  Used as ground truth T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21
  • 6. Experiment Images and Tags  Randomly selected 51 images  Contain at least two tagged regions  Created two tag sets for the 51 images  Each image is assigned two tags (one per set)  Tags are either “true” or “false”  “true”  object described by tag can be seen  “false”  object cannot be seen on the image  Keep subjects concentrated during experiment T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21
  • 7. Subjects & Experiment System  20 subjects  16 male, 4 female (age: 23-40, Ø=29.6)  Undergrads (6), PhD (12), office clerks (2)  Experiment system  Simple web page in Internet Explorer  Standard notebook, resolution 1680x1050  Tobii X60 eye-tracker (60 Hz, 0.5° accuracy) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21
  • 8. Conducting the Experiment  Each user looked at 51 tag-image-pairs  First tag-image-pair dismissed  94.3% correct answers  Equal for true/false tags  ~3s until decision (average)  85% of users strongly agreed or agreed that they felt comfortable during the experiment  Eyetracker did not much influence comfort T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21
  • 9. Pre-processing of Eye-tracking Data  Obtained 547 gaze paths from 20 users where  Users gave correct answers  Image has “true” tag assigned  Fixation extraction  Tobii Studio‟s velocity & distance thresholds  Fixation: focus on particular point on screen  One fixation inside or near the correct region  476 (87%) gaze paths fulfill this requirement T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21
  • 10. Analysis of Gaze Fixations (1)  Applied 13 fixation measures on the 476 paths (2 new, 7 standard Tobii , 4 literature)  Fixation measure: function on users‟ gaze paths  Calculated for each image region, over all users viewing the same tag-image-pair T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21
  • 11. Considered Fixation Measures Nr Name Favorite region r Origin 1 firstFixation No. of fixations before 1st on r Tobii 2 secondFixation No. of fixations before 2nd on r [13] 3 fixationsAfter No. of fixations after last on r [4] 4 fixationsBeforeDecision fixationsAfter, but before decision New 5 fixationsAfterDecision fixationsBeforeDecision and after New 6 fixationDuration Total duration of all fixations on r Tobii 7 firstFixationDuration Duration of first fixation on r Tobii 8 lastFixationDuration Duration of last fixation on r [11] 9 fixationCount Number of fixations on r Tobii 10 maxVisitDuration Max time first fixation until outside r Tobii 11 meanVisitDuration Mean time first fixation until outside r Tobii 12 visitCount No. of fixations until outside r Tobii 13 T. saccLength S. Staab – Identifying Objects in Imageslength, before fixation on r Walber, A. Scherp, Saccade [6]of 21 11
  • 12. Analysis of Gaze Fixations (2)  For every image region (b) the fixation measure is calculated over all gaze paths (c)  Results are summed up per region  Regions ordered according to fixation measure  If favorite region (d) and tag (a) match, result is true positive (tp), otherwise false positive (fp) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21
  • 13. Precision per Fixation Measure meanVisitDuration P Sum of tp and fp assignments fixationsBeforeDecision lastFixationDuration fixationDuration Fixation measures T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21
  • 14. Adding Boundaries and Weights  Take eye-tracker inaccuracies into account  Extension of region boundaries by 13 pixels  Larger regions more likely to be fixated  Give weight to regions < 5% of image size  meanVisitDuration increases to P = 0.67 T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21
  • 15. Examples: Tag-Region-Assignments T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 15 of 21
  • 16. Comparison with Baselines  Naïve baseline: largest region r is favorite  Random baseline: randomly select favorite r  Gaze / Gaze* significantly better (χ², α<0.001) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21
  • 17. Effect of Gaze Path Aggregation P Number of gaze paths used  Aggregation of precision P for Gaze*  Single user still significantly better (χ² for naive with α<0.001 and random with α<0.002) T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21
  • 18. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image? T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21
  • 19. Differentiate Two Objects  Use second tag set to identify different objects in the same image  16 images (of our 51) have two “true” tags  6 images had two correct regions identified  Proportion of 38%  Average precision for single object is 67%  Correct tag assignment for two images: 44% T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21
  • 20. Correctly Differentiated Objects T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 20 of 21
  • 21. Research Questions 1.Best fixation measure to find the correct image region given a specific tag?  meanVisitDuration with precision of 67% 2. Can we differentiate two regions in the same image?  Accuracy of 38% Acknowledgement: This research was partially supported by the EU projects Petamedia (FP7-216444) andObjects in Images T. Walber, A. Scherp, S. Staab – Identifying SocialSensor (FP7-287975). 21 of 21
  • 22. Influence of Red Dot  First 5 fixations, over all subjects and all images T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21
  • 23. Experiment Data Cleaning  Manually replaced images with a) Tags that are incomprehensible, require expert-knowledge, or nonsense b) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle) c) Obstructed objects (bicycle behind a car) d) “False”-tag actually refers to a visible part of the image and thus were “true” tags T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21