SlideShare une entreprise Scribd logo
1  sur  33
Spoken Web Search at Mediaeval
2013
Xavier Anguera, Florian Metze, Andi
Buzo, Igor Szoke and Luis Javier
Rodriguez-Fuentes
Spoken Audio Search (or Query-by-Example
Spoken-Term Detection)
Given a spoken query we search for instances at lexical
level within spoken documents
It is similar to Spoken Term Detection (NIST STD2006,
OpenKWS 2013) but…
 Queries are spoken

 Different speakers
 Different acoustic conditions
 No prior knowledge of the
language(s) might be available
SWS history in Mediaeval
• SWS 2011 had 5 finishing participants and
focused on 4 Indian languages
• SWS 2012 had 9 finishing participants and
focused on 4 African Languages
• SWS 2013 has 13 finishing (18 registered)
participants and contains 9 languages
18
16

14

1400
#teams
1200

database size

1000

12
10

800

8

600

6

400

4
200

2
0

0
2011

2012

2013
SWS 2013 evaluation setup
• 1 single search corpus with ~20 hours of
data, collected from contributions of 9
languages
– No transcription or language information is given
to participants

• 500 queries for dev and 500 queries for eval
– For each query, participants need to return all
instances of that query in the search corpus
Mediaeval SWS 2013
• 9 languages in different acoustic contexts: 4 African
languages
(isixhosa, isizulu, sepedi, setswana), Albanian, Basqu
e, Czech, non-native English, Romanian
#utts

time

Avg. length/utt.

Search corpus

10762

19:57:55

6.67s

Dev Queries

505

0:11:26h

1.35s

Extended dev*

1046

0:08:42h

0.49s

Eval Queries

503

0:11:37h

1.38s

Extended eval*

1037

0:08:57h

0.51s

Total
13853
20:38:37h
*Only Basque (3x) and Czech (10x) queries have extended versions
Database distribution per language
Language

Number of
utterances / total
duration

Number of queries

Speech quality (original
sampling rate)

Recording environment

African - isixhosa

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - isizulu

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - sepedi

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - setswana

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

Albanian

968 / 127 min.

50 / 50

PC microphone, 16KHz

Lab environment, read
speech

Basque

1841 / 192 min.

100 / 100 (recorded
by mobile phone)

TV Broadcast news,
16KHz

Studio, read speech

Czech

3667 / 252 min.

94 / 93

Telephone speech, 8KHz

Telephone calls into
radio broadcasts,
spontaneous speech

Non-native English

434 / 141 min.

61 / 60

High quality mic, 44KHz

Conference lectures,
spontaneous speech

Romanian

2272 / 244 min.

100 / 100

PC microphone, 16KHz

Lab environment, read
speech
SWS 2013 participants
Dto. Electricidad y electrónica, Universidad Pais Vasco

Spain

Speec@FIT, Brno University of Technology

Czech Republic

Telefonica Research

Spain
Romania

School of Electrical and Computer Engineering, Georgia Institute of Technology

USA

L2F - INESC-ID

Portugal

Departament de sistemes informàtics I Computació, Universitat Politècnica de València

Spain

Audiolab, University of Zilina

Slovakia

LIA, University of Avignon

France

Technical University of Kosice

Slovakia

Universitat Pompeu Fabra

Spain

DSP-STL, Dept. of EE, The chinese University of Hong Kong

Hong Kong

International Institute of Information Technology- Hyderabad

Non-finishing

country

University Politechnica of Bucarest

organizers

Team name

India

IAIS, Fraunhofer Institute

Germany

TATA Consultancy Services Ltd.

India

Indian Statistical Institute

India

Northwestern Polytechnical University of Xi’an

China

Toyota Technological Institute at Chicago

USA
Possible approaches to QbE-STD
Pattern based
Language spoken
Acoustic models +

Lattice based
Language models +

Word-based
Followed approaches
Team name
Dto. Electricidad y electrónica, Universidad Pais Vasco
Speec@FIT, Brno University of Technology
Telefonica Research
University Politechnica of Bucarest
School of Electrical and Computer Engineering, Georgia Institute of Technology
L2F - INESC-ID
Dept. de sistemes informàtics I Computació, Universitat Politècnica de València
Audiolab, University of Zilina
LIA, University of Avignon
Technical University of Kosice
Universitat Pompeu Fabra
DSP-STL, Dept. of EE, The chinese University of Hong Kong
International Institute of Information Technology- Hyderabad

DTW-like

AKWS
Scoring metrics
• PRIMARY: Actual Term Weighted Value (ATWV) /
Maximum Term Weighted Value (MTWV)
• Actual/minimum Cnxe
• Real-time factor
• Memory usage
Primary metric (dev)
Primary metric (eval)
Per language results
Average for the 10-best systems
Per-language results: African (eval)
Per-language results: Albanian(eval)
Per-language results: Basque(eval)
Per-language results: Czech (eval)
Per-language results: Non-native English (eval)
Per-language results: Romanian (eval)
DET dev

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.417, Thr=5.204)
L2F (MTWV=0.390, Thr=3.428)
CUHK (MTWV=0.368, Thr=0.530)
BUT (MTWV=0.371, Thr=0.930)
CMTECHETAL (MTWV=0.264, Thr=16.535)
IIITH (MTWV=0.253, Thr=2.130)
ELIRF (MTWV=0.170, Thr=2.697)
TID (MTWV=0.116, Thr=4.085)
GTC (MTWV=0.116, Thr=3.248)
SPEED (MTWV=0.083, Thr=0.960)
LIA-Late (MTWV=0.005, Thr=13.065)
UNIZA-Late (MTWV=0.000, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (development)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
DET eval

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.399, Thr=5.243)
L2F (MTWV=0.342, Thr=3.551)
CUHK (MTWV=0.306, Thr=0.618)
BUT (MTWV=0.297, Thr=0.914)
CMTECHETAL (MTWV=0.257, Thr=18.153)
IIITH (MTWV=0.224, Thr=2.721)
ELIRF (MTWV=0.159, Thr=2.759)
TID (MTWV=0.093, Thr=5.051)
GTC (MTWV=0.084, Thr=3.341)
SPEED (MTWV=0.059, Thr=0.923)
LIA-Late (MTWV=0.000, Thr=1079.003)
UNIZA-Late (MTWV=0.001, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (evaluation)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
Cnxe metric
Cnxe

2.9
Min Cnxe (development)

Act Cnxe (development)

3
2.8
Act Cnxe (evaluation)

CUHK

2.7

L2F

Min Cnxe (evaluation)

GTTS

2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ELIRF

TID

GTC

Cnxe for primary systems

BUT CMTECHETAL IIITH

SpeeD

LIA

UNIZA

TUKE
Extended Queries
• 4 teams submitted 4 extended systems, making use of 3
repetitions of Basque queries and 10 repetitions of Czech
queries available
– TID: computes each query individually and then puts together all
results
– GTTS: DTW-aligns all queries above a minimum duration and searches
with the resulting query
– GeorgiaTech: builds a graphical keyword model using more than one
instance
Extended systems
Extended systems
Extended systems
Extended systems
Real-Time Factor versus Memory usage
Real-Time Factor versus Memory usage (partial)
Take home messages
• The task was more complicated than in 2012
– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on
2013 data)
– HKCU MTWV-12 = 0.74 (on 2012 data)

• It is possible to do QbE-STD on unknown/low
resources data
New things to watch out for in the posters session
• BUT:
– Fusion of 26 systems (13 AKWS + 13 DTW)
– M-norm normalization

• IIIT:
– Articulatory Bottleneck features

• CUHK:
– Tokenizer construction using Gaussian Component clustering
– Query expansion using PSOLA

• L2F
– DTW candidate pre-selection

• GTTS:
– Distance matrix normalization in DTW

• GeorgiaTech:
– Low-resource speech modeling using EHMM Models

• LIA:
– Use of I-vectors in SWS

• ARF
– DTW string matching algorithm with a novel scoring
System presentations
• 16:30-16:45 "GTTS Systems for the SWS Task at
MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE,
Universidad del País Vasco
• 16:45-17:00 "The L2F Spoken Web Search system for
Mediaeval 2013”, Alberto Abad, L2F, INESC-ID
• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL
APPROACH", Lucas Ondel, Speech@BUT, Brno
University of Technology
• 17:15-17:30 "The CMTECH Spoken Web Search System
for MediaEval 2013", Ciro Gracia, UPF
• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier
Anguera

Contenu connexe

Similaire à Mediaeval 2013 Spoken Web Search results slides

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation ModelThamme Gowda
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...NU_I_TODALAB
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognitionmultimediaeval
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014multimediaeval
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig mediaCarlos Turró Ribalta
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Taskmultimediaeval
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Andrea Matsunaga
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Coursesipij
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxGérard Chollet
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 introef-anat
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti1
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 

Similaire à Mediaeval 2013 Spoken Web Search results slides (20)

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation Model
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognition
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig media
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Task
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 intro
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Mediaeval 2013 Spoken Web Search results slides

  • 1. Spoken Web Search at Mediaeval 2013 Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier Rodriguez-Fuentes
  • 2. Spoken Audio Search (or Query-by-Example Spoken-Term Detection) Given a spoken query we search for instances at lexical level within spoken documents It is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…  Queries are spoken  Different speakers  Different acoustic conditions  No prior knowledge of the language(s) might be available
  • 3. SWS history in Mediaeval • SWS 2011 had 5 finishing participants and focused on 4 Indian languages • SWS 2012 had 9 finishing participants and focused on 4 African Languages • SWS 2013 has 13 finishing (18 registered) participants and contains 9 languages 18 16 14 1400 #teams 1200 database size 1000 12 10 800 8 600 6 400 4 200 2 0 0 2011 2012 2013
  • 4. SWS 2013 evaluation setup • 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages – No transcription or language information is given to participants • 500 queries for dev and 500 queries for eval – For each query, participants need to return all instances of that query in the search corpus
  • 5. Mediaeval SWS 2013 • 9 languages in different acoustic contexts: 4 African languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basqu e, Czech, non-native English, Romanian #utts time Avg. length/utt. Search corpus 10762 19:57:55 6.67s Dev Queries 505 0:11:26h 1.35s Extended dev* 1046 0:08:42h 0.49s Eval Queries 503 0:11:37h 1.38s Extended eval* 1037 0:08:57h 0.51s Total 13853 20:38:37h *Only Basque (3x) and Czech (10x) queries have extended versions
  • 6. Database distribution per language Language Number of utterances / total duration Number of queries Speech quality (original sampling rate) Recording environment African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone) TV Broadcast news, 16KHz Studio, read speech Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech
  • 7. SWS 2013 participants Dto. Electricidad y electrónica, Universidad Pais Vasco Spain Speec@FIT, Brno University of Technology Czech Republic Telefonica Research Spain Romania School of Electrical and Computer Engineering, Georgia Institute of Technology USA L2F - INESC-ID Portugal Departament de sistemes informàtics I Computació, Universitat Politècnica de València Spain Audiolab, University of Zilina Slovakia LIA, University of Avignon France Technical University of Kosice Slovakia Universitat Pompeu Fabra Spain DSP-STL, Dept. of EE, The chinese University of Hong Kong Hong Kong International Institute of Information Technology- Hyderabad Non-finishing country University Politechnica of Bucarest organizers Team name India IAIS, Fraunhofer Institute Germany TATA Consultancy Services Ltd. India Indian Statistical Institute India Northwestern Polytechnical University of Xi’an China Toyota Technological Institute at Chicago USA
  • 8. Possible approaches to QbE-STD Pattern based Language spoken Acoustic models + Lattice based Language models + Word-based
  • 9. Followed approaches Team name Dto. Electricidad y electrónica, Universidad Pais Vasco Speec@FIT, Brno University of Technology Telefonica Research University Politechnica of Bucarest School of Electrical and Computer Engineering, Georgia Institute of Technology L2F - INESC-ID Dept. de sistemes informàtics I Computació, Universitat Politècnica de València Audiolab, University of Zilina LIA, University of Avignon Technical University of Kosice Universitat Pompeu Fabra DSP-STL, Dept. of EE, The chinese University of Hong Kong International Institute of Information Technology- Hyderabad DTW-like AKWS
  • 10. Scoring metrics • PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV) • Actual/minimum Cnxe • Real-time factor • Memory usage
  • 13. Per language results Average for the 10-best systems
  • 20. DET dev Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.417, Thr=5.204) L2F (MTWV=0.390, Thr=3.428) CUHK (MTWV=0.368, Thr=0.530) BUT (MTWV=0.371, Thr=0.930) CMTECHETAL (MTWV=0.264, Thr=16.535) IIITH (MTWV=0.253, Thr=2.130) ELIRF (MTWV=0.170, Thr=2.697) TID (MTWV=0.116, Thr=4.085) GTC (MTWV=0.116, Thr=3.248) SPEED (MTWV=0.083, Thr=0.960) LIA-Late (MTWV=0.005, Thr=13.065) UNIZA-Late (MTWV=0.000, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (development) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 21. DET eval Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.399, Thr=5.243) L2F (MTWV=0.342, Thr=3.551) CUHK (MTWV=0.306, Thr=0.618) BUT (MTWV=0.297, Thr=0.914) CMTECHETAL (MTWV=0.257, Thr=18.153) IIITH (MTWV=0.224, Thr=2.721) ELIRF (MTWV=0.159, Thr=2.759) TID (MTWV=0.093, Thr=5.051) GTC (MTWV=0.084, Thr=3.341) SPEED (MTWV=0.059, Thr=0.923) LIA-Late (MTWV=0.000, Thr=1079.003) UNIZA-Late (MTWV=0.001, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (evaluation) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 22. Cnxe metric Cnxe 2.9 Min Cnxe (development) Act Cnxe (development) 3 2.8 Act Cnxe (evaluation) CUHK 2.7 L2F Min Cnxe (evaluation) GTTS 2.6 2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ELIRF TID GTC Cnxe for primary systems BUT CMTECHETAL IIITH SpeeD LIA UNIZA TUKE
  • 23. Extended Queries • 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available – TID: computes each query individually and then puts together all results – GTTS: DTW-aligns all queries above a minimum duration and searches with the resulting query – GeorgiaTech: builds a graphical keyword model using more than one instance
  • 28. Real-Time Factor versus Memory usage
  • 29. Real-Time Factor versus Memory usage (partial)
  • 30. Take home messages • The task was more complicated than in 2012 – GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on 2013 data) – HKCU MTWV-12 = 0.74 (on 2012 data) • It is possible to do QbE-STD on unknown/low resources data
  • 31. New things to watch out for in the posters session • BUT: – Fusion of 26 systems (13 AKWS + 13 DTW) – M-norm normalization • IIIT: – Articulatory Bottleneck features • CUHK: – Tokenizer construction using Gaussian Component clustering – Query expansion using PSOLA • L2F – DTW candidate pre-selection • GTTS: – Distance matrix normalization in DTW • GeorgiaTech: – Low-resource speech modeling using EHMM Models • LIA: – Use of I-vectors in SWS • ARF – DTW string matching algorithm with a novel scoring
  • 32.
  • 33. System presentations • 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco • 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID • 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology • 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF • 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera

Notes de l'éditeur

  1. AKWS means they use some sort of Viterbi alg.DTW-like means they use DTW algorithms to match different sorts of features
  2. La UPF te molt bona regularització per a trobat el optim score en tots els queries.TID I IIIT tenen mal matching entre ATWV I MTWVOnly the positive scores were plotted