BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012

•Télécharger en tant que PPT, PDF•

1 j'aime•713 vues

MediaEval2012

Technologie

System overview

Our internal task was
− to build simple and minimalistic language
dependent Query-by-Example (QbE).


Ingredients
− Development data, Neural net classifier,
Phoneme recognizer, Acoustic keyword
spotting, DTW, Calibration

MediaEval SWS 2012 BUT2012 3
workshop - 4.-5.10. Pisa

System overview


Sentence mean normalization
Bottle-Neck Posteriors

Neural network based features
AKWS - X
− bottle-necks
DTW X X
− three state phone posteriors
(GMM/HMM) X -

Query detector
− AKWS
− DTW
− (GMM/HMM) – not submitted to the evals
MediaEval SWS 2012 BUT2012 4
workshop - 4.-5.10. Pisa

Underlying technologies

Universal context, bottle-neck neural network base classifier

devC state re-alignment, Reduced phone set (50 phonemes)

Trained by Tnet – our tool, publicly available

MediaEval SWS 2012 BUT2012 5
workshop - 4.-5.10. Pisa

Phnrec, R-AKWS, AKWS

Phoneme recognizer - free phone loop, devC 66.02% PAC


R-AKWS - Queries extracted from phone alignment

AKWS - Queries extracted from phone recognizer
devQ - devC devQ - evalC
MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWV
R-AKWS 0.739 0.786 0.859 R-AKWS 0.653 0.703 0.789
AKWS 0.452 0.493 0.600 AKWS 0.377 0.429 0.552
MediaEval SWS 2012 BUT2012 6
workshop - 4.-5.10. Pisa

DTW

Used as a baseline.. bottlenecks are better than posteriors

devQ - devC
evalQ - evalC
MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWV
R-AKWS 0.739 0.786 0.859 R-AKWS - - -
AKWS 0.452 0.493 0.600 AKWS 0.470 0.530 0.672
DTW 0.400 0.468 0.552 DTW 0.426 0.488 0.599

MediaEval SWS 2012 BUT2012 7
workshop - 4.-5.10. Pisa

GMM/HMM

Inspired by AKWS, not submitted due to bad results.

MTWV MTWVcalib UBTWV
R-AKWS 0.739 0.786 0.859
AKWS 0.452 0.493 0.600
DTW 0.400 0.468 0.552
GMM/HMM 0.011 - 0.336
MediaEval SWS 2012 BUT2012 8
workshop - 4.-5.10. Pisa

Calibration

TWV - pooled, UBTWV - non-pooled TWV (each term has its best thr.)

Calibration of scores (linear combination of 12 parameters - 6 features
with linear and quadratic forms). Trained on UBTWV thresholds.
− Query length (w/o outer sil), Length of inner sil,
− Score average global, Score average by phonemes
− Phonemes count, Detections count

We found that Detections count and Length of inner sil work the best for
AKWS (after evals).
Parameter Training error AKWS Training error DTW
Detections count 0.1272 0.002115
Length of inner sil 0.1577 0.002687
Query length (w/o outer sil) 0.1626 0.002773
Score average global 0.1635 0.002530
Phonemes count 0.1656 0.002779
Score average by phonemes 0.1660 0.002746
MediaEval SWS 2012 BUT2012 9
workshop - 4.-5.10. Pisa

Calibration AKWS

MediaEval SWS 2012 BUT2012 10
workshop - 4.-5.10. Pisa

Conclusion
devQ-devC evalQ-evalC
ATWV MTWV UBTWV ATWV MTWV UBTWV
AKWS 0.488 0.502 0.600 0.522 0.553 0.672
(0.488) (0.452) (0.492) (0.530)
DTW 0.443 0.468 0.552 0.448 0.488 0.599

• AKWS with new calibration (submitted in brackets)
• Good and consistent data, enough to train good Phnrec
• GMM/HMM does not perform well on in-language condition
and 1 example per query (our best system in last year)
• Number of detections is important calibration feature (due
to TWV)
• Future work: detections calibration, system fusion

Like / Dislike / Next evals?

Like:
− Adapted TWV, real KWS scoring
− Phone alignment provided
− Good data, great work of organizers

"Dislike":
− No test data alignment
− No speaker information

Next evals:
− More examples per query?
− Provide query and the query sentence (adaptation issue)?
− Non-pooled scoring metric?
− We would like to share our features – more on poster
session
MediaEval SWS 2012 BUT2012 12
workshop - 4.-5.10. Pisa

Thank You for Your attention.

MediaEval SWS 2012 BUT2012 13
workshop - 4.-5.10. Pisa

Recommandé

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012MediaEval2012

The Watershed-based Social Events Detection Method with Support from External...MediaEval2012

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012

The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012

The Spoken Web Search TaskMediaEval2012

Event Detection via LDA for the MediaEval2012 SED TaskMediaEval2012

NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012

mevd2012 esra_MediaEval2012

Recommandé

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012MediaEval2012

The Watershed-based Social Events Detection Method with Support from External...MediaEval2012

Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012

The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012

The Spoken Web Search TaskMediaEval2012

Event Detection via LDA for the MediaEval2012 SED TaskMediaEval2012

NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012

mevd2012 esra_MediaEval2012

29174_01_VESDA-E_VEP_OverviewPutri Verninda

Manual quali poc test description rf-scanning v10 6Victor Alfonso Salas Salazar

IBM SAN Volume Controller Performance Analysisbrettallison

Top Ten Ways to Commercialize Microtechnologyjwylde

Ganesh_Mallya_resume_15082016.pdfGanesh Mallya

NEW NanoScope Semiintro extLloyd Peto

Atti della Conferenza GENERAZIONE EOLICA “PULITA”: aspetti meccatronici/azion...Centro Produttività Veneto

Impact of novel ms ms all acquisition and processing techniques on forensic t...SCIEX

Oval Internetworking Devicesc3i

802.11n - Increased Speed Increased ComplexitySavvius, Inc

Unleash oracle 12c performance with cisco ucssolarisyougood

Track g test strategy - deltachiportal

FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem

In-service synchronization monitoring and assuranceADVA

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...Transcat

Überleben im OSB/SOA Dschungel Daniel JorayDésirée Pfister

IPv6 strategy for deployment at ETH SwitzerlandSwiss IPv6 Council

Panduit EMEA SI Webinar 10Panduit

Digital Processes with PowerPath Barcodes, Scanning and Digital ImagingChris Godin✪

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community

MediaEval 2012 OpeningMediaEval2012

ClosingMediaEval2012

Contenu connexe

Similaire à BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012

29174_01_VESDA-E_VEP_OverviewPutri Verninda

Manual quali poc test description rf-scanning v10 6Victor Alfonso Salas Salazar

IBM SAN Volume Controller Performance Analysisbrettallison

Top Ten Ways to Commercialize Microtechnologyjwylde

Ganesh_Mallya_resume_15082016.pdfGanesh Mallya

NEW NanoScope Semiintro extLloyd Peto

Atti della Conferenza GENERAZIONE EOLICA “PULITA”: aspetti meccatronici/azion...Centro Produttività Veneto

Impact of novel ms ms all acquisition and processing techniques on forensic t...SCIEX

Oval Internetworking Devicesc3i

802.11n - Increased Speed Increased ComplexitySavvius, Inc

Unleash oracle 12c performance with cisco ucssolarisyougood

Track g test strategy - deltachiportal

FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem

In-service synchronization monitoring and assuranceADVA

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...Transcat

Überleben im OSB/SOA Dschungel Daniel JorayDésirée Pfister

IPv6 strategy for deployment at ETH SwitzerlandSwiss IPv6 Council

Panduit EMEA SI Webinar 10Panduit

Digital Processes with PowerPath Barcodes, Scanning and Digital ImagingChris Godin✪

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community

Similaire à BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012 (20)

29174_01_VESDA-E_VEP_Overview

Manual quali poc test description rf-scanning v10 6

IBM SAN Volume Controller Performance Analysis

Top Ten Ways to Commercialize Microtechnology

Ganesh_Mallya_resume_15082016.pdf

NEW NanoScope Semiintro ext

Atti della Conferenza GENERAZIONE EOLICA “PULITA”: aspetti meccatronici/azion...

Impact of novel ms ms all acquisition and processing techniques on forensic t...

Oval Internetworking Devices

802.11n - Increased Speed Increased Complexity

Unleash oracle 12c performance with cisco ucs

Track g test strategy - delta

FEASIBLE-Benchmark-Framework-ISWC2015

In-service synchronization monitoring and assurance

A Transcat.com Webinar Presented by Aglient Technolgoes: Scope Technology Imp...

Überleben im OSB/SOA Dschungel Daniel Joray

IPv6 strategy for deployment at ETH Switzerland

Panduit EMEA SI Webinar 10

Digital Processes with PowerPath Barcodes, Scanning and Digital Imaging

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned

Plus de MediaEval2012

MediaEval 2012 OpeningMediaEval2012

ClosingMediaEval2012

A Multimodal Approach for Video Geocoding MediaEval2012

Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012

Search and Hyperlinking Task at MediaEval 2012MediaEval2012

CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012

DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012

Brave New Task: User Account MatchingMediaEval2012

The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012

Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012

MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012

MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012

Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012

The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012

LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012

Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012

ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...MediaEval2012

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012

UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012

Plus de MediaEval2012 (20)

MediaEval 2012 Opening

Closing

A Multimodal Approach for Video Geocoding

Brave New Task: Musiclef Multimodal Music Tagging

Search and Hyperlinking Task at MediaEval 2012

CUNI at MediaEval 2012: Search and Hyperlinking Task

DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task

Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...

Brave New Task: User Account Matching

The CLEF Initiative From 2010 to 2012 and Onwards

Overview of MediaEval 2012 Visual Privacy Task

MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...

MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...

Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...

The MediaEval 2012 Affect Task: Violent Scenes Detectio

LIG at MediaEval 2012 affect task: use of a generic method

Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...

ARF @ MediaEval 2012: An Uninformed Approach to Violence Detection in Hollywo...

The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...

UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Histor y of HAM Radio presentation slidevu2urc

Developing An App To Navigate The Roads of BrazilV3cube

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

How to convert PDF to text with Nanonetsnaman860154

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

IAC 2024 - IA Fast Track to Search Focused AI Solutions

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Automating Google Workspace (GWS) & more with Apps Script

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Boost PC performance: How more available memory can improve productivity

Driving Behavioral Change for Information Management through Data-Driven Gree...

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Presentation on how to chat with PDF using ChatGPT code interpreter

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Histor y of HAM Radio presentation slide

Developing An App To Navigate The Roads of Brazil

Finology Group – Insurtech Innovation Award 2024

Data Cloud, More than a CDP by Matt Robison

How to convert PDF to text with Nanonets

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

BUT2012 APPROACHES FOR SPOKEN WEB SEARCH - MEDIAEVAL 2012

1. BUT2012 Brno University of Technology Faculty of Information Technology Speech@FIT Igor Szöke, Michal Fapšo, Karel Veselý MediaEval 2012 workshop – SWS task, October 4.-5. 2012, Pisa

2. Outlines  Systems overview & Underlying technologies  PhnRec, R-AKWS, AKWS – primary system  DTW  (GMM/HMM) – not submitted  Calibration  Results and discussion MediaEval SWS 2012 BUT2012 2 workshop - 4.-5.10. Pisa

3. System overview  Our internal task was − to build simple and minimalistic language dependent Query-by-Example (QbE).  Ingredients − Development data, Neural net classifier, Phoneme recognizer, Acoustic keyword spotting, DTW, Calibration MediaEval SWS 2012 BUT2012 3 workshop - 4.-5.10. Pisa

4. System overview  Sentence mean normalization Bottle-Neck Posteriors  Neural network based features AKWS - X − bottle-necks DTW X X − three state phone posteriors (GMM/HMM) X -  Query detector − AKWS − DTW − (GMM/HMM) – not submitted to the evals MediaEval SWS 2012 BUT2012 4 workshop - 4.-5.10. Pisa

5. Underlying technologies  Universal context, bottle-neck neural network base classifier  devC state re-alignment, Reduced phone set (50 phonemes)  Trained by Tnet – our tool, publicly available MediaEval SWS 2012 BUT2012 5 workshop - 4.-5.10. Pisa

6. Phnrec, R-AKWS, AKWS  Phoneme recognizer - free phone loop, devC 66.02% PAC  R-AKWS - Queries extracted from phone alignment  AKWS - Queries extracted from phone recognizer devQ - devC devQ - evalC MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWV R-AKWS 0.739 0.786 0.859 R-AKWS 0.653 0.703 0.789 AKWS 0.452 0.493 0.600 AKWS 0.377 0.429 0.552 MediaEval SWS 2012 BUT2012 6 workshop - 4.-5.10. Pisa

7. DTW  Used as a baseline.. bottlenecks are better than posteriors devQ - devC evalQ - evalC MTWV MTWVcalib UBTWV MTWV MTWVcalib UBTWV R-AKWS 0.739 0.786 0.859 R-AKWS - - - AKWS 0.452 0.493 0.600 AKWS 0.470 0.530 0.672 DTW 0.400 0.468 0.552 DTW 0.426 0.488 0.599 MediaEval SWS 2012 BUT2012 7 workshop - 4.-5.10. Pisa

8. GMM/HMM  Inspired by AKWS, not submitted due to bad results. MTWV MTWVcalib UBTWV R-AKWS 0.739 0.786 0.859 AKWS 0.452 0.493 0.600 DTW 0.400 0.468 0.552 GMM/HMM 0.011 - 0.336 MediaEval SWS 2012 BUT2012 8 workshop - 4.-5.10. Pisa

9. Calibration  TWV - pooled, UBTWV - non-pooled TWV (each term has its best thr.)  Calibration of scores (linear combination of 12 parameters - 6 features with linear and quadratic forms). Trained on UBTWV thresholds. − Query length (w/o outer sil), Length of inner sil, − Score average global, Score average by phonemes − Phonemes count, Detections count  We found that Detections count and Length of inner sil work the best for AKWS (after evals). Parameter Training error AKWS Training error DTW Detections count 0.1272 0.002115 Length of inner sil 0.1577 0.002687 Query length (w/o outer sil) 0.1626 0.002773 Score average global 0.1635 0.002530 Phonemes count 0.1656 0.002779 Score average by phonemes 0.1660 0.002746 MediaEval SWS 2012 BUT2012 9 workshop - 4.-5.10. Pisa

10. Calibration AKWS MediaEval SWS 2012 BUT2012 10 workshop - 4.-5.10. Pisa

11. Conclusion devQ-devC evalQ-evalC ATWV MTWV UBTWV ATWV MTWV UBTWV AKWS 0.488 0.502 0.600 0.522 0.553 0.672 (0.488) (0.452) (0.492) (0.530) DTW 0.443 0.468 0.552 0.448 0.488 0.599 • AKWS with new calibration (submitted in brackets) • Good and consistent data, enough to train good Phnrec • GMM/HMM does not perform well on in-language condition and 1 example per query (our best system in last year) • Number of detections is important calibration feature (due to TWV) • Future work: detections calibration, system fusion

12. Like / Dislike / Next evals?  Like: − Adapted TWV, real KWS scoring − Phone alignment provided − Good data, great work of organizers  "Dislike": − No test data alignment − No speaker information  Next evals: − More examples per query? − Provide query and the query sentence (adaptation issue)? − Non-pooled scoring metric? − We would like to share our features – more on poster session MediaEval SWS 2012 BUT2012 12 workshop - 4.-5.10. Pisa

13. Thank You for Your attention. MediaEval SWS 2012 BUT2012 13 workshop - 4.-5.10. Pisa