SlideShare une entreprise Scribd logo
1  sur  8
The Speech Recognition Virtual
             Kitchen

Florian Metze and Eric Fosler-Lussier

INTERSPEECH 2012
Multimedia Retrieval and Summarization



   “Traditional” Multimedia Retrieval and Summarization
      Select frames and shots that are most informative

      Save user time by avoiding repetitions etc. (BBC Rushes Summarization)

   Recent Advances in Natural Language Processing
      Replace “extractive” summarization of text with “abstractive” techniques

      Use Statistical Machine Translation as a general technique to convert long
       “foreign” symbol sequence into concise English text

   Would this not apply nicely to Multi-media?
      Easily have huge amounts of data

      “Skimming”, “tagging” with keywords, or “liking” clearly doesn’t do justice to
       relevance, complexity and potential of Multi-media
What’s Next?



   Generate more detailed synopses, add temporal aspects, properties

   Add more modalities (sounds, etc.)

   “What is in these videos?”

      Text could summarize multiple videos at once

      Attract interest to (groups of) videos

   “Why is this video relevant? Or different?”

      Text can relate a retrieved video to the query

      Text can potentially flag false alarms, outliers
 Thank You!
Feature Definition


Event name: Changing a vehicle tire
                                                                                                                   Extract candidates for relevant
                                                                                                                    objects from “Event Kit”
Definition: One or more people work to replace a tire on a
vehicle                                                                                                            Determine salient objects from MED
                                                                                                                    features
Explication: A vehicle is any device, motorized or not, used to transport people and/or other
items. Tires are ring-shaped inflated objects, usually made of rubber, that fit over the wheel of a                Intersect both sets
vehicle. The process for replacing a tire includes removing the existing tire and installing the new tire
onto the wheel of the vehicle. Tires typically are replaced because they are damaged or worn down. If
a tire is damaged and loses air pressure as a result, it is called a "flat tire". Generally the driver of the         Use ontologies to resolve
vehicle with a flat tire will stop the vehicle as soon as possible and replace the affected tire with a
temporary tire called a "spare tire”, which may be stored elsewhere on/in the vehicle. In other cases,
the tire may be changed not by the vehicle operator, but by a professional (e.g. a mechanic) who may
                                                                                                                        synonyms, etc
use dedicated tools and work in a repair shop or similar setting.

                                                                                                                      Combine data-driven and
                                                                                                                        knowledge based sources
Evidential description:

 scene: garage, outdoors, street, parking lot
MER Approach: Feature Extraction


   What to mention:

        Take visual evidence (for 100s of classes) for video

        Re-rank using manually determined “importance”

   How to mention:

        Present as corroborating or contraindicative
         evidence

        Place additional constraints

   Similar for ASR hypotheses

        Based on unigrams for now

   Move from “hand-engineered” to automatic
    methods

        Now: similar to Tf/ Idf measure, BOW features

        Future: Bipartite graph matching to determine “good”
         concepts
INTERSPEECH AFTERPARTY
“Speech Recognition Virtual Kitchen”


   Broadway 3 & 4

   4:30pm on Thursday, September 13

   We want your input to grow this idea
    further – show your support

   Come and see more demos of VMs
                                              Kitchen Server                  Third-Party Server
   Discuss with potential users or content                         SW/        SW/
                                                VM          VM      Data       Data
    providers from outside the speech                               Repo       Repo



    community                                               ➀             ➁


                                              Host PC                         Virtual Machine
   Present your own ideas in a short                                         • So ware and Data
                                                                Virtual       • Example Scripts
                                               Data                           • Tutorials
    presentation(?)                                     ➂
                                                                Machine
                                                                              • Reference/ Sample Log-files
                                                                              • …


   http://www.speechkitchen.org/
http://speechkitchen.cse.ohio-state.edu/

Contenu connexe

En vedette

ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
How INRIA identifies Geographic Location of a Video
How INRIA identifies Geographic Location of a VideoHow INRIA identifies Geographic Location of a Video
How INRIA identifies Geographic Location of a VideoMediaEval2012
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskMediaEval2012
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term DetectionARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term DetectionMediaEval2012
 
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsQMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsMediaEval2012
 

En vedette (18)

ARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video ClassificationARF @ MediaEval 2012: Multimodal Video Classification
ARF @ MediaEval 2012: Multimodal Video Classification
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012CUHK System for the Spoken Web Search task at Mediaeval 2012
CUHK System for the Spoken Web Search task at Mediaeval 2012
 
How INRIA identifies Geographic Location of a Video
How INRIA identifies Geographic Location of a VideoHow INRIA identifies Geographic Location of a Video
How INRIA identifies Geographic Location of a Video
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
CERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection TaskCERTH @ MediaEval 2012 Social Event Detection Task
CERTH @ MediaEval 2012 Social Event Detection Task
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012
BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term DetectionARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection
ARF @ MediaEval 2012: A Romanian ASR-based Approach to Spoken Term Detection
 
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo CollectionsQMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections
 

Similaire à The Speech Recognition Virtual Kitchen

Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...FIAT/IFTA
 
Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...Matthew Skelton
 
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)Amazon Web Services
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsTEST Huddle
 
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2Pascal Rapicault
 
Tlu introduction-to-cloud
Tlu introduction-to-cloudTlu introduction-to-cloud
Tlu introduction-to-cloudVan Phuc
 
Adcss 2014 system data repository - yan tang
Adcss 2014   system data repository - yan tangAdcss 2014   system data repository - yan tang
Adcss 2014 system data repository - yan tangEuropean Patent Office
 
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesAndré Torkveen
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersEmanuele Della Valle
 
IPC07 Talk - Beautiful Code with AOP and DI
IPC07 Talk - Beautiful Code with AOP and DIIPC07 Talk - Beautiful Code with AOP and DI
IPC07 Talk - Beautiful Code with AOP and DIRobert Lemke
 
Semantic opinion mining ontology
Semantic opinion mining ontologySemantic opinion mining ontology
Semantic opinion mining ontologyMohammed Almashraee
 
Surviving agile remote teams - why remote work is a SKILL
Surviving agile remote teams - why remote work is a SKILLSurviving agile remote teams - why remote work is a SKILL
Surviving agile remote teams - why remote work is a SKILLJens Broos
 
DevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixDevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixAbigail Bangser
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Development and QA dilemmas in DevOps
Development and QA dilemmas in DevOpsDevelopment and QA dilemmas in DevOps
Development and QA dilemmas in DevOpsMatteo Emili
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)John Pape
 
AWS_AI_Services_2023_edition_1679400999.pdf
AWS_AI_Services_2023_edition_1679400999.pdfAWS_AI_Services_2023_edition_1679400999.pdf
AWS_AI_Services_2023_edition_1679400999.pdfabcdsidprats
 
Salesforce Internship Presentation (Summer 2012)
Salesforce Internship Presentation (Summer 2012)Salesforce Internship Presentation (Summer 2012)
Salesforce Internship Presentation (Summer 2012)Nitin Jain
 

Similaire à The Speech Recognition Virtual Kitchen (20)

Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...Rosinski ibm ai overview with several examples of projects in the media and l...
Rosinski ibm ai overview with several examples of projects in the media and l...
 
Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...Practical, team-focused operability techniques for distributed systems - DevO...
Practical, team-focused operability techniques for distributed systems - DevO...
 
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
AWS re:Invent 2016: Building a Solid Business Case for Cloud Migration (ENT308)
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional Tests
 
Managing Your Runtime With P2
Managing Your Runtime With P2Managing Your Runtime With P2
Managing Your Runtime With P2
 
Tlu introduction-to-cloud
Tlu introduction-to-cloudTlu introduction-to-cloud
Tlu introduction-to-cloud
 
Adcss 2014 system data repository - yan tang
Adcss 2014   system data repository - yan tangAdcss 2014   system data repository - yan tang
Adcss 2014 system data repository - yan tang
 
Get your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web TechnologiesGet your organization’s feet wet with Semantic Web Technologies
Get your organization’s feet wet with Semantic Web Technologies
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Introduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS PractitionersIntroduction to Semantic Web for GIS Practitioners
Introduction to Semantic Web for GIS Practitioners
 
IPC07 Talk - Beautiful Code with AOP and DI
IPC07 Talk - Beautiful Code with AOP and DIIPC07 Talk - Beautiful Code with AOP and DI
IPC07 Talk - Beautiful Code with AOP and DI
 
Semantic opinion mining ontology
Semantic opinion mining ontologySemantic opinion mining ontology
Semantic opinion mining ontology
 
Surviving agile remote teams - why remote work is a SKILL
Surviving agile remote teams - why remote work is a SKILLSurviving agile remote teams - why remote work is a SKILL
Surviving agile remote teams - why remote work is a SKILL
 
Jtf new
Jtf newJtf new
Jtf new
 
DevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing KratixDevExForPlatformEngineers, introducing Kratix
DevExForPlatformEngineers, introducing Kratix
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Development and QA dilemmas in DevOps
Development and QA dilemmas in DevOpsDevelopment and QA dilemmas in DevOps
Development and QA dilemmas in DevOps
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
AWS_AI_Services_2023_edition_1679400999.pdf
AWS_AI_Services_2023_edition_1679400999.pdfAWS_AI_Services_2023_edition_1679400999.pdf
AWS_AI_Services_2023_edition_1679400999.pdf
 
Salesforce Internship Presentation (Summer 2012)
Salesforce Internship Presentation (Summer 2012)Salesforce Internship Presentation (Summer 2012)
Salesforce Internship Presentation (Summer 2012)
 

Plus de MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesMediaEval2012
 

Plus de MediaEval2012 (12)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

The Speech Recognition Virtual Kitchen

  • 1. The Speech Recognition Virtual Kitchen Florian Metze and Eric Fosler-Lussier INTERSPEECH 2012
  • 2. Multimedia Retrieval and Summarization  “Traditional” Multimedia Retrieval and Summarization  Select frames and shots that are most informative  Save user time by avoiding repetitions etc. (BBC Rushes Summarization)  Recent Advances in Natural Language Processing  Replace “extractive” summarization of text with “abstractive” techniques  Use Statistical Machine Translation as a general technique to convert long “foreign” symbol sequence into concise English text  Would this not apply nicely to Multi-media?  Easily have huge amounts of data  “Skimming”, “tagging” with keywords, or “liking” clearly doesn’t do justice to relevance, complexity and potential of Multi-media
  • 3. What’s Next?  Generate more detailed synopses, add temporal aspects, properties  Add more modalities (sounds, etc.)  “What is in these videos?”  Text could summarize multiple videos at once  Attract interest to (groups of) videos  “Why is this video relevant? Or different?”  Text can relate a retrieved video to the query  Text can potentially flag false alarms, outliers
  • 5. Feature Definition Event name: Changing a vehicle tire  Extract candidates for relevant objects from “Event Kit” Definition: One or more people work to replace a tire on a vehicle  Determine salient objects from MED features Explication: A vehicle is any device, motorized or not, used to transport people and/or other items. Tires are ring-shaped inflated objects, usually made of rubber, that fit over the wheel of a  Intersect both sets vehicle. The process for replacing a tire includes removing the existing tire and installing the new tire onto the wheel of the vehicle. Tires typically are replaced because they are damaged or worn down. If a tire is damaged and loses air pressure as a result, it is called a "flat tire". Generally the driver of the  Use ontologies to resolve vehicle with a flat tire will stop the vehicle as soon as possible and replace the affected tire with a temporary tire called a "spare tire”, which may be stored elsewhere on/in the vehicle. In other cases, the tire may be changed not by the vehicle operator, but by a professional (e.g. a mechanic) who may synonyms, etc use dedicated tools and work in a repair shop or similar setting.  Combine data-driven and knowledge based sources Evidential description: scene: garage, outdoors, street, parking lot
  • 6. MER Approach: Feature Extraction  What to mention:  Take visual evidence (for 100s of classes) for video  Re-rank using manually determined “importance”  How to mention:  Present as corroborating or contraindicative evidence  Place additional constraints  Similar for ASR hypotheses  Based on unigrams for now  Move from “hand-engineered” to automatic methods  Now: similar to Tf/ Idf measure, BOW features  Future: Bipartite graph matching to determine “good” concepts
  • 7. INTERSPEECH AFTERPARTY “Speech Recognition Virtual Kitchen”  Broadway 3 & 4  4:30pm on Thursday, September 13  We want your input to grow this idea further – show your support  Come and see more demos of VMs Kitchen Server Third-Party Server  Discuss with potential users or content SW/ SW/ VM VM Data Data providers from outside the speech Repo Repo community ➀ ➁ Host PC Virtual Machine  Present your own ideas in a short • So ware and Data Virtual • Example Scripts Data • Tutorials presentation(?) ➂ Machine • Reference/ Sample Log-files • …  http://www.speechkitchen.org/