Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Mediaeval 2013 Spoken Web Search results slides

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 33 Publicité

Plus De Contenu Connexe

Similaire à Mediaeval 2013 Spoken Web Search results slides (20)

Plus récents (20)

Publicité

Mediaeval 2013 Spoken Web Search results slides

  1. 1. Spoken Web Search at Mediaeval 2013 Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier Rodriguez-Fuentes
  2. 2. Spoken Audio Search (or Query-by-Example Spoken-Term Detection) Given a spoken query we search for instances at lexical level within spoken documents It is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…  Queries are spoken  Different speakers  Different acoustic conditions  No prior knowledge of the language(s) might be available
  3. 3. SWS history in Mediaeval • SWS 2011 had 5 finishing participants and focused on 4 Indian languages • SWS 2012 had 9 finishing participants and focused on 4 African Languages • SWS 2013 has 13 finishing (18 registered) participants and contains 9 languages 18 16 14 1400 #teams 1200 database size 1000 12 10 800 8 600 6 400 4 200 2 0 0 2011 2012 2013
  4. 4. SWS 2013 evaluation setup • 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages – No transcription or language information is given to participants • 500 queries for dev and 500 queries for eval – For each query, participants need to return all instances of that query in the search corpus
  5. 5. Mediaeval SWS 2013 • 9 languages in different acoustic contexts: 4 African languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basqu e, Czech, non-native English, Romanian #utts time Avg. length/utt. Search corpus 10762 19:57:55 6.67s Dev Queries 505 0:11:26h 1.35s Extended dev* 1046 0:08:42h 0.49s Eval Queries 503 0:11:37h 1.38s Extended eval* 1037 0:08:57h 0.51s Total 13853 20:38:37h *Only Basque (3x) and Czech (10x) queries have extended versions
  6. 6. Database distribution per language Language Number of utterances / total duration Number of queries Speech quality (original sampling rate) Recording environment African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone) TV Broadcast news, 16KHz Studio, read speech Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech
  7. 7. SWS 2013 participants Dto. Electricidad y electrónica, Universidad Pais Vasco Spain Speec@FIT, Brno University of Technology Czech Republic Telefonica Research Spain Romania School of Electrical and Computer Engineering, Georgia Institute of Technology USA L2F - INESC-ID Portugal Departament de sistemes informàtics I Computació, Universitat Politècnica de València Spain Audiolab, University of Zilina Slovakia LIA, University of Avignon France Technical University of Kosice Slovakia Universitat Pompeu Fabra Spain DSP-STL, Dept. of EE, The chinese University of Hong Kong Hong Kong International Institute of Information Technology- Hyderabad Non-finishing country University Politechnica of Bucarest organizers Team name India IAIS, Fraunhofer Institute Germany TATA Consultancy Services Ltd. India Indian Statistical Institute India Northwestern Polytechnical University of Xi’an China Toyota Technological Institute at Chicago USA
  8. 8. Possible approaches to QbE-STD Pattern based Language spoken Acoustic models + Lattice based Language models + Word-based
  9. 9. Followed approaches Team name Dto. Electricidad y electrónica, Universidad Pais Vasco Speec@FIT, Brno University of Technology Telefonica Research University Politechnica of Bucarest School of Electrical and Computer Engineering, Georgia Institute of Technology L2F - INESC-ID Dept. de sistemes informàtics I Computació, Universitat Politècnica de València Audiolab, University of Zilina LIA, University of Avignon Technical University of Kosice Universitat Pompeu Fabra DSP-STL, Dept. of EE, The chinese University of Hong Kong International Institute of Information Technology- Hyderabad DTW-like AKWS
  10. 10. Scoring metrics • PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV) • Actual/minimum Cnxe • Real-time factor • Memory usage
  11. 11. Primary metric (dev)
  12. 12. Primary metric (eval)
  13. 13. Per language results Average for the 10-best systems
  14. 14. Per-language results: African (eval)
  15. 15. Per-language results: Albanian(eval)
  16. 16. Per-language results: Basque(eval)
  17. 17. Per-language results: Czech (eval)
  18. 18. Per-language results: Non-native English (eval)
  19. 19. Per-language results: Romanian (eval)
  20. 20. DET dev Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.417, Thr=5.204) L2F (MTWV=0.390, Thr=3.428) CUHK (MTWV=0.368, Thr=0.530) BUT (MTWV=0.371, Thr=0.930) CMTECHETAL (MTWV=0.264, Thr=16.535) IIITH (MTWV=0.253, Thr=2.130) ELIRF (MTWV=0.170, Thr=2.697) TID (MTWV=0.116, Thr=4.085) GTC (MTWV=0.116, Thr=3.248) SPEED (MTWV=0.083, Thr=0.960) LIA-Late (MTWV=0.005, Thr=13.065) UNIZA-Late (MTWV=0.000, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (development) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  21. 21. DET eval Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.399, Thr=5.243) L2F (MTWV=0.342, Thr=3.551) CUHK (MTWV=0.306, Thr=0.618) BUT (MTWV=0.297, Thr=0.914) CMTECHETAL (MTWV=0.257, Thr=18.153) IIITH (MTWV=0.224, Thr=2.721) ELIRF (MTWV=0.159, Thr=2.759) TID (MTWV=0.093, Thr=5.051) GTC (MTWV=0.084, Thr=3.341) SPEED (MTWV=0.059, Thr=0.923) LIA-Late (MTWV=0.000, Thr=1079.003) UNIZA-Late (MTWV=0.001, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (evaluation) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  22. 22. Cnxe metric Cnxe 2.9 Min Cnxe (development) Act Cnxe (development) 3 2.8 Act Cnxe (evaluation) CUHK 2.7 L2F Min Cnxe (evaluation) GTTS 2.6 2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ELIRF TID GTC Cnxe for primary systems BUT CMTECHETAL IIITH SpeeD LIA UNIZA TUKE
  23. 23. Extended Queries • 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available – TID: computes each query individually and then puts together all results – GTTS: DTW-aligns all queries above a minimum duration and searches with the resulting query – GeorgiaTech: builds a graphical keyword model using more than one instance
  24. 24. Extended systems
  25. 25. Extended systems
  26. 26. Extended systems
  27. 27. Extended systems
  28. 28. Real-Time Factor versus Memory usage
  29. 29. Real-Time Factor versus Memory usage (partial)
  30. 30. Take home messages • The task was more complicated than in 2012 – GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on 2013 data) – HKCU MTWV-12 = 0.74 (on 2012 data) • It is possible to do QbE-STD on unknown/low resources data
  31. 31. New things to watch out for in the posters session • BUT: – Fusion of 26 systems (13 AKWS + 13 DTW) – M-norm normalization • IIIT: – Articulatory Bottleneck features • CUHK: – Tokenizer construction using Gaussian Component clustering – Query expansion using PSOLA • L2F – DTW candidate pre-selection • GTTS: – Distance matrix normalization in DTW • GeorgiaTech: – Low-resource speech modeling using EHMM Models • LIA: – Use of I-vectors in SWS • ARF – DTW string matching algorithm with a novel scoring
  32. 32. System presentations • 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco • 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID • 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology • 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF • 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera

Notes de l'éditeur

  • AKWS means they use some sort of Viterbi alg.DTW-like means they use DTW algorithms to match different sorts of features
  • La UPF te molt bona regularització per a trobat el optim score en tots els queries.TID I IIIT tenen mal matching entre ATWV I MTWVOnly the positive scores were plotted

×