SlideShare a Scribd company logo
1 of 27
Download to read offline
Research & Development




   Text vs. Speech
A Comparison of Tagging Input Modalities 
         for Camera Phones



      Mauro Cherubini, Xavier Anguera, 
     Nuria Oliver, and Rodrigo de Oliveira
people do not want to tag
             their pictures
intro → hypotheses → methodology → results → implications
research question:

 Assuming that users are willing to
 input at least one tag, which input
modality can help the production and
      retrieval of the pictures?


intro → hypotheses → methodology → results → implications
hypothesis 1

   Speech is preferred to text as an
   annotation mechanism on mobile
     phones (objective measure)

Support: 
- Mitchard and Winkles (2002)

intro → hypotheses → methodology → results → implications
hypothesis 1-bis

  Speech annotations are preferred by
users even if this means spending more
 time on the task (subjective measure)

 Support: 
 - Perakakis and Potamianos (2008)

intro → hypotheses → methodology → results → implications
hypothesis 2

  The longer the tag the larger the
  advantage of voice over text for
annotating pictures on mobile phones

Support: 
- Hauptmann and Rudnicky (1990)

intro → hypotheses → methodology → results → implications
hypothesis 3

 Retrieving pictures on mobile phones
with speech is not faster than with text
          (objective measure)

 Support: 
 - Mills et al. (2000)

 intro → hypotheses → methodology → results → implications
the user study
   field study
                          controlled
   (4 weeks)
                           experiment

                                     T1 - T2 - T3 - T4

  3 experimental conditions:
         a. Speech only
           b. Text only
      c. Speech and Text

intro → hypotheses → methodology → results → implications
MAMI




intro → hypotheses → methodology → results → implications
features of MAMI
                         

    •  processing is done entirely on the mobile
       phone
    •  speech is not transcribed
    •  to compare the waveforms of the audio tags,
       MAMI uses algorithm of Dynamic Time
       Warping


intro → hypotheses → methodology → results → implications
task 1: remember the tag
            stimulus
                    retrieval




Pictures taken during the field trial


intro → hypotheses → methodology → results → implications
task 2: remember the context
          stimulus
                      retrieval

      TASK 2
      PICTURE 1

      three little bushes
      Garden
      Tree
      Stairs




intro → hypotheses → methodology → results → implications
task 3: remember the picture
          stimulus
                      retrieval




                      Text
  Audio tags were converted into
    textual tags and vice versa

intro → hypotheses → methodology → results → implications
task 4: remember the
                         sequence
        assignment
                      retrieval

     TASK 4

     Three pictures among
     the oldest and three 
     pictures among the 
     newest.




intro → hypotheses → methodology → results → implications
metrics

     •  time to completion
     •  false positives
     •  retrieval errors


intro → hypotheses → methodology → results → implications
results H1




intro → hypotheses → methodology → results → implications
results H1-bis
 All participants in the BOTH group felt that tagging
 with text was more effective than tagging with voice.

   Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD])
    1 = completely agree; 5 = completely disagree




intro → hypotheses → methodology → results → implications
results H2




intro → hypotheses → methodology → results → implications
results H3




intro → hypotheses → methodology → results → implications
results H3 - continued
take away 1: 
       speech is not a given

the advantage of audio as an input modality for tagging
       pictures on mobile phones is not a given


                           why?
                  1. retrieval precision
                        2. privacy

intro → hypotheses → methodology → results → implications
take away 2: 
              input mistakes
     we address text input mistakes immediately. 
 on the contrary mistakes in audio recordings are less
                frequently addressed




intro → hypotheses → methodology → results → implications
take away 3: 
                  memory

      speech does not help memorizing the tags




intro → hypotheses → methodology → results → implications
implication 1:
   allow multiple modalities




                       © Pixar, 2008


intro → hypotheses → methodology → results → implications
implication 2:
    enable audio inspection




intro → hypotheses → methodology → results → implications
implication 3: 
enable modality synesthesia




                       © Disney, 1940
intro → hypotheses → methodology → results → implications
Research  Development




              end
              thanks

        martigan@gmail.com
          mauro@tid.es


http://www.i-cherubini.it/mauro/blog/
  http://research.tid.es/multimedia/

More Related Content

Similar to Research on Tagging Photos with Text vs. Speech Input

CarterCritique1
CarterCritique1CarterCritique1
CarterCritique1amyecarter
 
CarterCritique1
CarterCritique1CarterCritique1
CarterCritique1amyecarter
 
Pennymotsett ppquiz
Pennymotsett ppquizPennymotsett ppquiz
Pennymotsett ppquizPennyCM
 
Cognitive principles of instruction (edet 722) ctml
Cognitive principles of instruction (edet 722) ctmlCognitive principles of instruction (edet 722) ctml
Cognitive principles of instruction (edet 722) ctmlacademic3
 
GloCALL 2013 conference presentation
GloCALL 2013 conference presentationGloCALL 2013 conference presentation
GloCALL 2013 conference presentationTakeshi Sato
 

Similar to Research on Tagging Photos with Text vs. Speech Input (8)

CarterCritique1
CarterCritique1CarterCritique1
CarterCritique1
 
CarterCritique1
CarterCritique1CarterCritique1
CarterCritique1
 
Clark ch 5 and 6
Clark ch 5 and 6Clark ch 5 and 6
Clark ch 5 and 6
 
Pennymotsett ppquiz
Pennymotsett ppquizPennymotsett ppquiz
Pennymotsett ppquiz
 
Cognitive principles of instruction (edet 722) ctml
Cognitive principles of instruction (edet 722) ctmlCognitive principles of instruction (edet 722) ctml
Cognitive principles of instruction (edet 722) ctml
 
GloCALL 2013 conference presentation
GloCALL 2013 conference presentationGloCALL 2013 conference presentation
GloCALL 2013 conference presentation
 
Science.1207745.full
Science.1207745.fullScience.1207745.full
Science.1207745.full
 
Blenderbot
BlenderbotBlenderbot
Blenderbot
 

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Research on Tagging Photos with Text vs. Speech Input

  • 1. Research & Development Text vs. Speech A Comparison of Tagging Input Modalities for Camera Phones Mauro Cherubini, Xavier Anguera, Nuria Oliver, and Rodrigo de Oliveira
  • 2. people do not want to tag their pictures intro → hypotheses → methodology → results → implications
  • 3. research question: Assuming that users are willing to input at least one tag, which input modality can help the production and retrieval of the pictures? intro → hypotheses → methodology → results → implications
  • 4. hypothesis 1 Speech is preferred to text as an annotation mechanism on mobile phones (objective measure) Support: - Mitchard and Winkles (2002) intro → hypotheses → methodology → results → implications
  • 5. hypothesis 1-bis Speech annotations are preferred by users even if this means spending more time on the task (subjective measure) Support: - Perakakis and Potamianos (2008) intro → hypotheses → methodology → results → implications
  • 6. hypothesis 2 The longer the tag the larger the advantage of voice over text for annotating pictures on mobile phones Support: - Hauptmann and Rudnicky (1990) intro → hypotheses → methodology → results → implications
  • 7. hypothesis 3 Retrieving pictures on mobile phones with speech is not faster than with text (objective measure) Support: - Mills et al. (2000) intro → hypotheses → methodology → results → implications
  • 8. the user study field study controlled (4 weeks) experiment T1 - T2 - T3 - T4 3 experimental conditions: a. Speech only b. Text only c. Speech and Text intro → hypotheses → methodology → results → implications
  • 9. MAMI intro → hypotheses → methodology → results → implications
  • 10. features of MAMI •  processing is done entirely on the mobile phone •  speech is not transcribed •  to compare the waveforms of the audio tags, MAMI uses algorithm of Dynamic Time Warping intro → hypotheses → methodology → results → implications
  • 11. task 1: remember the tag stimulus retrieval Pictures taken during the field trial intro → hypotheses → methodology → results → implications
  • 12. task 2: remember the context stimulus retrieval TASK 2 PICTURE 1 three little bushes Garden Tree Stairs intro → hypotheses → methodology → results → implications
  • 13. task 3: remember the picture stimulus retrieval Text Audio tags were converted into textual tags and vice versa intro → hypotheses → methodology → results → implications
  • 14. task 4: remember the sequence assignment retrieval TASK 4 Three pictures among the oldest and three pictures among the newest. intro → hypotheses → methodology → results → implications
  • 15. metrics •  time to completion •  false positives •  retrieval errors intro → hypotheses → methodology → results → implications
  • 16. results H1 intro → hypotheses → methodology → results → implications
  • 17. results H1-bis All participants in the BOTH group felt that tagging with text was more effective than tagging with voice. Voice: 3.33 [0.81], Text: 4.34 [0.81] (Mean [SD]) 1 = completely agree; 5 = completely disagree intro → hypotheses → methodology → results → implications
  • 18. results H2 intro → hypotheses → methodology → results → implications
  • 19. results H3 intro → hypotheses → methodology → results → implications
  • 20. results H3 - continued
  • 21. take away 1: speech is not a given the advantage of audio as an input modality for tagging pictures on mobile phones is not a given why? 1. retrieval precision 2. privacy intro → hypotheses → methodology → results → implications
  • 22. take away 2: input mistakes we address text input mistakes immediately. on the contrary mistakes in audio recordings are less frequently addressed intro → hypotheses → methodology → results → implications
  • 23. take away 3: memory speech does not help memorizing the tags intro → hypotheses → methodology → results → implications
  • 24. implication 1: allow multiple modalities © Pixar, 2008 intro → hypotheses → methodology → results → implications
  • 25. implication 2: enable audio inspection intro → hypotheses → methodology → results → implications
  • 26. implication 3: enable modality synesthesia © Disney, 1940 intro → hypotheses → methodology → results → implications
  • 27. Research Development end thanks martigan@gmail.com mauro@tid.es http://www.i-cherubini.it/mauro/blog/ http://research.tid.es/multimedia/