Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

CUNI at MediaEval 2012: Search and Hyperlinking Task

1 296 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

CUNI at MediaEval 2012: Search and Hyperlinking Task

  1. 1. CUNI at MediaEval 2012Search and Hyperlinking Task Petra Galuščáková and Pavel Pecina Institute of Formal and Applied Linguistics Charles University in Prague {galuscakova,pecina}@ufal.mff.cuni.cz
  2. 2. Search and Hyperlinking Task● Search and Hyperlinking task ● Search Subtask –look up the relevant segment in the set of visual data ● Hyperlinking Subtask and then possibly find another video segments related to the – retrieved one● We have participated in the Search Subtask only● Both transcripts (LIMSI and LIUM) were used● We did not use concept recognition, shot segmentation and face detection
  3. 3. Segmentation● The exact relevant passage in the recording should be retrieved → the transcripts were at first divided into segments● The IR system then was used for the retrieval in the collection of such segments● Two strategies for segmentation: ● Regular segmentation according to the time ● TextTilling
  4. 4. Regular Segmentation● Segments of 45, 60, 90 and 120 seconds● Segments were partially overlapping ● Each 30 seconds a new segment was created. ● The segment was removed from the list of the retrieved segments if it partially overlapped with one of the higher ranked segments.
  5. 5. TextTiling Segmentation● Good results achieved in RSR MediaEval Track in 2011 [Eskevich et al, 2012].● The transcripts were at first preprocessed and the sentences boundaries (based mainly on the punctuation) were marked.● Used settings: ● average number of the words in a sentence was set to 27 and ● average number of the sentences in one segment was set to 9 ● Better correspond to the 90 seconds long segments.
  6. 6. Terrier● Terrier information retrieval system was used● http://terrier.org● Wide range of applicable search engines, language models and available features● The highest score was achieved applying Hiemstra Language Model and TF IDF search engine.● Terrier settings: we used Porter Stemmer, stopword list, query expansion and implicit parameters for both TF IDF search engine and Hiemstra language model
  7. 7. Experiments
  8. 8. Results Tran. Eng. Seg MRR mGAP MASP 60 30 10 60 30 10 Mod 60 30 10- LIMSI Hiem No 0.34 0.27 0.10 0.21 0.10 0 0.57 0 0 01 LIMSI TFIDF 90s 0.42 0.31 0.15 0.26 0.16 0.03 0.56 0.11 0.08 0.042 LIUM Hiem 60s 0.38 0.34 0.19 0.26 0.17 0.03 0.50 0.11 0.11 0.063 LIMSI TFIDF 60s 0.47 0.40 0.19 0.31 0.20 0.04 0.62 0.16 0.14 0.064 LIMSI Hiem 90s 0.47 0.36 0.19 0.29 0.19 0.04 0.64 0.12 0.09 0.045 LIMSI Hiem TT 0.28 0.26 0.2 0.21 0.16 0.03 0.37 0.16 0.16 0.15● Runs 1 and 2 were required, only title field of the query was used● Another three runs use also short title field● In all of the cases metadata information was added (description and tags) to each segment.
  9. 9. Observations● The highest MRR and mGAP scores were achieved applying regular segmentation.● The highest MASP score was achieved using TextTiling segmentation● The difference between scores achieved by TF IDF engine with 60 seconds long segments and Hiemstra LM with 90 seconds long segments are very small for MRR and mGAP measures but it is higher for MASP measure.
  10. 10. Segment Length● Shorter segments achieve higher mGAP and MASP scores but this dependency is more pronounced for MASP measure● MRR score achieves the highest values for the 90 seconds long segments● Window size 60 seconds
  11. 11. Future Work● We would especially like to aim on the increasing mGAP and MASP score in future → we would like improve the segmentation precision● And use audio and visual information (e.g. shot segmentation)● Examine shorter segments
  12. 12. Conclusions
  13. 13. Conclusions● Two types of segmentation: regular according to the time and TextTiling● Terrier IR system, Hiemstra LM and TF IDF search engine were used● The highest MRR and mGAP scores were achieved using regular segmentation (60 and 90 seconds) comparing to TextTiling segmentation algorithm which achieved the highest MASP scores● The dependency of the measures on the length of the segments was examined.
  14. 14. Thank you