SlideShare a Scribd company logo
1 of 46
Download to read offline
Multimodal Features for Search and 
Hyperlinking of Video Content 
Petra Galuščáková 
galuscakova@ufal.mff.cuni.cz 
Institute of Formal and Applied Linguistics 
Charles University in Prague 
29. 10. 2014
2 
Outline 
● Speech Retrieval and Hyperlinking 
● Data and Evaluation 
● System Description 
● Passage Retrieval, Segmentation of Recordings 
● Visual and Prosodic Information
3 
Speech Retrieval and Hyperlinking
4 
Search in Audio-Visual 
Documents 
● Input: 
● Data collection (video recordings) 
● Query 
– Given as text 
● Output: 
● Relevant segments (passages) of documents 
● E.g. “Children out on poetry trip Exploration of poetry by 
school children Poem writing”, “Space-Cowboys Space Pirates 
Pirates in Space talking music”, “animal park, kenya marathon , 
wildlife reserve”
5 
Hyperlinking 
● Input: 
● Data collection (video recordings) 
● Query segment 
● Output: 
● Segments similar to the query 
segment
6 
Data and Evaluation
● MediaEval is a benchmarking initiative dedicated to 
development, comparison, and improvement of strategies for 
processing and retrieving multimedia content. 
7 
● E.g. speech recognition, multimedia content analysis, music and 
audio analysis, user-contributed information (tags, tweets), viewer 
affective response, social networks, temporal and geo-coordinates 
● 2012 MediaEval Search and Hyperlinking Task 
● 2013 MediaEval Search and Hyperlinking Task 
● 2013 Similar Segments in Social Speech Task 
● 2014 MediaEval Search and Hyperlinking Task
8 
Search and Hyperlinking Task 
● The main goal of the Search Subtask 
● Find passages relevant to a user’s interest given by a textual query in 
a large set of audio-visual recordings 
● And of the Hyperlinking Subtask: 
● To find more passages similar to the retrieved ones 
● Scenario: 
● A user wants to find a piece of information relevant to a given query 
in a collection of TV programmes (Search subtask) 
● And then navigate through a large archive using hyperlinks to the 
retrieved segments (Hyperlinking subtask)
9 
Search and Hyperlinking Task 
2014 Data 
● TV programme recordings provided by BBC 
● All BBC programmes broadcasted during 4 months 
● 1335 hours for training, 2686 hours for testing 
● Subtitles and three ASR transcripts (LIMSI, LIUM, and NST 
Sheffield) 
● Metadata, detected shots, stable keyframes, prosodic features 
● Search: 50 training and 30 test queries 
● E.g. sightseeing london, egypt travel, celebrity diet 
● Hyperlinking: 30 training and 30 test queries 
● Given as a query segment (beginning and end)
10 
Evaluation 
● Full document retrieval → MRR 
● RR = 1 / rank of the first correctly retrieved document 
● MRR = average of the RR values for the set of the queries 
● Retrieval of the exact passages → MRR-window 
● Starting points of retrieved segments is limited to appear less 
than 60 seconds from the starting point of the relevant 
segment to be considered as correctly retrieved 
● MRRw = average of the RRw values for the set of the queries 
● Retrieval of the exact passages → mGAP, MASP 
● Takes into account the exact beginning (end) of a relevant 
segment
11 
Evaluation Cont. 
● MAP, P5, P10, P20 
● MAP-bin 
● MAP-tol 
Aly R., Eskevich M., Ordelman R., and Jones G.J.F.: Adapting Binary Information Retrieval Evaluation Metrics for Segment-based 
Retrieval Tasks. Technical Report, 2013.
12 
System Description
13 
Passage Retrieval 
● Documents are automatically divided into shorter segments 
● Segments serve as documents in the traditional IR setup 
● The segmentation is crucial for the quality of the retrieval 
– Especially the segment length 
→ We focus on the segmentation strategies
14 
Effect of Passage Retrieval 
Segm. 
Manual ASR 
MRR MRRw mGAP MRR MRRw mGAP 
None 0.879 0.315 0.029 0.858 0.333 0.027 
Manual 0.897 0.671 0.277 0.885 0.669 0.247 
● Segmentation may highly improve retrieval of the segment 
beginnings (MRRw and mGAP measures) 
● Segmentation may improve retrieval of full recordings (MRR 
measure) 
Similar Segments in Social Speech Task 2013
15 
Baseline System 
● We employ the Terrier IR toolkit 
● Hiemstra language model 
● Parameter set to 0.35 (importance of a query term in a document) 
● Stopwords removal, stemming 
● Post-filtering of the answers
16 
Post-filtering Effect 
● MAP, P5, P10 and P20 are notably higher in the experiments in 
which we did not remove partially overlapping segments 
● These measures do not distinguish, whether a user had 
already seen the retrieved segment 
● The overlapping segments are expected not to be so 
beneficial for the users 
Transcript Filtering MAP P5 P10 P20 MAP-bin MAP-tol 
Subtitles Yes 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157 
Subtitles No 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515 
Search and Hyperlinking Task 2014 (Search subtask)
17 
Baseline System - Hyperlinking 
● Transformed into Search subtask 
● Query segment is transformed into a textual query by 
including all the words of the subtitles lying within the 
segment boundary 
● Queries created on subtitles outperform ASR queries 
● Even if we run the retrieval on the ASR transcripts
18 
System Tuning 
● Metadata 
● Concatenate metadata with each segment 
● Title, episode title, description, short episode synopsis, 
service name and program variant 
● In Hyperlinking: Concatenate metadata with the query 
segment 
● Context 
● In Hyperlinking: use 200 seconds before the segment 
beginning and after the segment end
Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol 
Subtitles None 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155 
Subtitles Metadata 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023 
Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol 
Subtitles None 0.1147 0.3071 0.2786 0.2036 0.1021 0.0792 
Subtitles Metadata 
+Context 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237 
19 
System Tuning Cont. 
● Search 
● Hyperlinking 
Search and Hyperlinking Task 2014
20 
Segmentation Strategies
21 
Segmentation Types 
● Fixed-length (Window-based) 
● Segments of equilong length with regular shift 
● Claimed to be a very effective approach 
● Similarity-based 
● Measure the similarity between neighboring segments (e.g. cosine 
distance) 
● Algorithms TextTiling and C99 
● Lexical-chain-based 
● A sequence of lexicographically related word occurrences 
● Feature-based
22 
Fixed-Length Segmentation 
Comparison 
● S – Sentence 
● Sh – Shot 
● Sp – Speech Segment 
● TP – Time + Pause 
● TO – Time + Overlap (Fixed- 
Length Segment) 
M. Eskevich et al.: Multimedia information seeking through 
search and hyperlinking, ICMR 2013. 
Search and Hyperlinking Task 2012 (Search Subtask)
23 
Fixed-length Segmentation 
Segment Length 
Search and Hyperlinking Task 2013 (Search subtask)
24 
Fixed-length Segmentation 
Segment Shift 
Search and Hyperlinking Task 2013 (Search subtask)
25 
Feature-based Segmentation
26 
Feature-based Segmentation 
● We identify possible segment boundaries (beginnings and 
ends) 
● J48 decision trees (almost equivalent to C4.5), Weka 
framework 
● Training data available for the Similar Segments in Social 
Speech Task, MediaEval 2013 
● Manually marked segments 
● Conversations between university students 
● Binary classification problem 
● For each word in the transcripts, we predict whether a 
segment boundary occurs after this word 
● Classes: segment boundary and segment continuation
27 
Used Features 
● Cue words and tags 
● N-grams which frequently appear at segment boundaries 
● N-grams most informative for segment boundaries 
● Manually defined n-grams 
● Letter cases 
● Length of the silence before the word 
● Measured as a difference between timestamps of two adjacent 
words 
● Division given in transcripts (e.g., speech segments defined in 
the LIMSI transcripts) 
● The output of the TextTiling algorithm
28 
Most Informative Features 
● Division defined in the transcripts 
● The length of silence 
● Especially if it is longer than 300ms, 400ms, 500ms, 600ms) 
● TextTiling algorithm output 
● Segment beginnings: “if”, “I’m”, “especially”, “the”, “are you”, 
“you have”, “VBP PRP VBG”, … 
● Segment ends: “good”, “interesting”, “lot”, …
29 
Feature-based Segmentation 
Approaches
30 
Feature-based Segmentation 
Approaches Comparison 
Beg. End MRR MRRw mGAP #Seg Len [s] 
-- -- 0.656 0.052 0.027 2 k 2531.6 
Reg Reg 0.671 0.388 0.245 234 k 49.5 
ML -- 0.549 0.117 0.060 3125 k 2.3 
-- ML 0.607 0.310 0.192 280 k 29.0 
ML B+50 0.685 0.412 0.272 5820 k 49.6 
E+50 ML 0.715 0.428 0.298 2580 k 49.6 
ML ML 0.626 0.392 0.229 5659 k 20.2 
Search and Hyperlinking Task 2013 (Search subtask), Results on the subtitles
Feature-based Segmentation vs. 
Fixed-Length Segmentation 
31 
● Search Task 
Transcript Segm. Seg. 
Len. MAP P5 P10 P20 MAP-bin 
● Hyperlinking Task 
MAP-tol 
Subtitles Fixed 60s 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023 
Subtitles Featur 
es 50s 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 
Transcript Segm. Seg. 
Search and Hyperlinking Task 2014 
Len. MAP P5 P10 P20 MAP-bin 
MAP-tol 
Subtitles Fixed 60s 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 
Subtitles Features 50s 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991
32 
Visual Information in 
Segmentation 
● Training data used for segmentation tuning are visually static 
visual information → would not be helpful 
● Create segments only if visual similarity between adjacent 
segment < weight 
● Tune the weight on the Search and Hyperlinking training 
data 
Transcript Segm. MAP P5 P10 P20 MAP-bin 
MAP-tol 
Subtitles Features 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 
Subtitles Features+ 
Visual 0.7701 0.7600 0.7500 0.6733 0.3285 0.2530 
Search and Hyperlinking Task 2014 (Search Subtask)
33 
Visual and Prosodic Similarity
34 
Visual Similarity
35 
Visual Similarity Cont. 
● We use Feature Signatures and Signature Quadratic Form 
Distance 
http://siret.ms.mff.cuni.cz
36 
Visual Similarity Results 
Transcript Meta. Weights MAP P5 P10 P20 MAP-bin 
MAP-tol 
Subtitles No None 0.1618 0.4786 0.4107 0.2893 0.1423 0.1216 
Subtitles No Visual 0.1660 0.4929 0.4143 0.3000 0.1483 0.1245 
Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 
Subtitles Yes Visual 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 
LIMSI Yes None 0.4166 0.8533 0.7133 0.5450 0.2659 0.2297 
LIMSI Yes Visual 0.4168 0.8667 0.7333 0.5400 0.2692 0.2414 
LIUM Yes None 0.4226 0.8333 0.7300 0.5433 0.2593 0.2547 
LIUM Yes Visual 0.4212 0.8400 0.7367 0.5350 0.2622 0.2632 
NST Yes None 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237 
NST Yes Visual 0.4160 0.8267 0.7167 0.5483 0.2655 0.2440 
Search and Hyperlinking Task 2014 (Hyperlinking subtask)
37 
Visual Similarity Results - 
Positive Query Examples
38 
Visual Similarity Results - 
Negative Query Examples
39 
Prosodic Similarity
40 
Prosodic Similarity Results 
Transcript Meta. Weights MAP P5 P10 P20 MAP-bin 
MAP-tol 
Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 
Subtitles Yes Prosodic 0.4321 0.8533 0.7767 0.5517 0.2687 0.2473 
● Small but promising improvement 
Search and Hyperlinking Task 2014 (Hyperlinking subtask)
41 
System Comparison
42 
Search Task
43 
Hyperlinking Task
44 
Conclusion
45 
Conclusion 
● Passage Retrieval 
● Improves retrieval of relevant segments 
● Can improve retrieval of full recordings 
● Segmentation approach is crucial for the retrieval 
● Fixed-length segmentation works well 
● Feature-based segmentation outperforms fixed-length 
segmentation 
● Visual and prosodic similarity can improve results of text-based 
retrieval
46 
Thank you 
This research is supported by the Charles University Grant Agency (GA UK n. 920913)

More Related Content

Similar to Multimodal Features for Search and Hyperlinking of Video Content

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Petra Galuscakova
 
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and HyperlinkingDCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and Hyperlinkingmultimediaeval
 
Multimodal Features for Linking Television Content
Multimodal Features for Linking Television ContentMultimodal Features for Linking Television Content
Multimodal Features for Linking Television ContentPetra Galuscakova
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.Pankaj Chandan Mohapatra
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016Sumo Logic
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationFisnik Kraja
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorialGanesh Venkataraman
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionGeorge Awad
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Rakebul Hasan
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Lviv Startup Club
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAlessandro Benedetti
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneSease
 
reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.pptNaglaaFathy42
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic
 

Similar to Multimodal Features for Search and Hyperlinking of Video Content (20)

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and HyperlinkingDCU Search Runs at MediaEval 2014 Search and Hyperlinking
DCU Search Runs at MediaEval 2014 Search and Hyperlinking
 
Multimodal Features for Linking Television Content
Multimodal Features for Linking Television ContentMultimodal Features for Linking Television Content
Multimodal Features for Linking Television Content
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.A Machine learning approach to classify a pair of sentence as duplicate or not.
A Machine learning approach to classify a pair of sentence as duplicate or not.
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Ssbse10.ppt
Ssbse10.pptSsbse10.ppt
Ssbse10.ppt
 
Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016Sumo Logic QuickStart Webinar Sep 2016
Sumo Logic QuickStart Webinar Sep 2016
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
I explore
I exploreI explore
I explore
 
Instant search - A hands-on tutorial
Instant search  - A hands-on tutorialInstant search  - A hands-on tutorial
Instant search - A hands-on tutorial
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text Description
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016Sumo Logic Quick Start - Feb 2016
Sumo Logic Quick Start - Feb 2016
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Advanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache LuceneAdvanced Document Similarity with Apache Lucene
Advanced Document Similarity with Apache Lucene
 
reverse engineering.ppt
reverse engineering.pptreverse engineering.ppt
reverse engineering.ppt
 
Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016Sumo Logic QuickStart - May 2016
Sumo Logic QuickStart - May 2016
 

More from Petra Galuscakova

Combining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalCombining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalPetra Galuscakova
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionPetra Galuscakova
 
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachEvaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachPetra Galuscakova
 
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskPetra Galuscakova
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiPetra Galuscakova
 
Application of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalApplication of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalPetra Galuscakova
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPetra Galuscakova
 

More from Petra Galuscakova (7)

Combining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalCombining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information Retrieval
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
 
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachEvaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
 
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
 
Application of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalApplication of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information Retrieval
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Multimodal Features for Search and Hyperlinking of Video Content

  • 1. Multimodal Features for Search and Hyperlinking of Video Content Petra Galuščáková galuscakova@ufal.mff.cuni.cz Institute of Formal and Applied Linguistics Charles University in Prague 29. 10. 2014
  • 2. 2 Outline ● Speech Retrieval and Hyperlinking ● Data and Evaluation ● System Description ● Passage Retrieval, Segmentation of Recordings ● Visual and Prosodic Information
  • 3. 3 Speech Retrieval and Hyperlinking
  • 4. 4 Search in Audio-Visual Documents ● Input: ● Data collection (video recordings) ● Query – Given as text ● Output: ● Relevant segments (passages) of documents ● E.g. “Children out on poetry trip Exploration of poetry by school children Poem writing”, “Space-Cowboys Space Pirates Pirates in Space talking music”, “animal park, kenya marathon , wildlife reserve”
  • 5. 5 Hyperlinking ● Input: ● Data collection (video recordings) ● Query segment ● Output: ● Segments similar to the query segment
  • 6. 6 Data and Evaluation
  • 7. ● MediaEval is a benchmarking initiative dedicated to development, comparison, and improvement of strategies for processing and retrieving multimedia content. 7 ● E.g. speech recognition, multimedia content analysis, music and audio analysis, user-contributed information (tags, tweets), viewer affective response, social networks, temporal and geo-coordinates ● 2012 MediaEval Search and Hyperlinking Task ● 2013 MediaEval Search and Hyperlinking Task ● 2013 Similar Segments in Social Speech Task ● 2014 MediaEval Search and Hyperlinking Task
  • 8. 8 Search and Hyperlinking Task ● The main goal of the Search Subtask ● Find passages relevant to a user’s interest given by a textual query in a large set of audio-visual recordings ● And of the Hyperlinking Subtask: ● To find more passages similar to the retrieved ones ● Scenario: ● A user wants to find a piece of information relevant to a given query in a collection of TV programmes (Search subtask) ● And then navigate through a large archive using hyperlinks to the retrieved segments (Hyperlinking subtask)
  • 9. 9 Search and Hyperlinking Task 2014 Data ● TV programme recordings provided by BBC ● All BBC programmes broadcasted during 4 months ● 1335 hours for training, 2686 hours for testing ● Subtitles and three ASR transcripts (LIMSI, LIUM, and NST Sheffield) ● Metadata, detected shots, stable keyframes, prosodic features ● Search: 50 training and 30 test queries ● E.g. sightseeing london, egypt travel, celebrity diet ● Hyperlinking: 30 training and 30 test queries ● Given as a query segment (beginning and end)
  • 10. 10 Evaluation ● Full document retrieval → MRR ● RR = 1 / rank of the first correctly retrieved document ● MRR = average of the RR values for the set of the queries ● Retrieval of the exact passages → MRR-window ● Starting points of retrieved segments is limited to appear less than 60 seconds from the starting point of the relevant segment to be considered as correctly retrieved ● MRRw = average of the RRw values for the set of the queries ● Retrieval of the exact passages → mGAP, MASP ● Takes into account the exact beginning (end) of a relevant segment
  • 11. 11 Evaluation Cont. ● MAP, P5, P10, P20 ● MAP-bin ● MAP-tol Aly R., Eskevich M., Ordelman R., and Jones G.J.F.: Adapting Binary Information Retrieval Evaluation Metrics for Segment-based Retrieval Tasks. Technical Report, 2013.
  • 13. 13 Passage Retrieval ● Documents are automatically divided into shorter segments ● Segments serve as documents in the traditional IR setup ● The segmentation is crucial for the quality of the retrieval – Especially the segment length → We focus on the segmentation strategies
  • 14. 14 Effect of Passage Retrieval Segm. Manual ASR MRR MRRw mGAP MRR MRRw mGAP None 0.879 0.315 0.029 0.858 0.333 0.027 Manual 0.897 0.671 0.277 0.885 0.669 0.247 ● Segmentation may highly improve retrieval of the segment beginnings (MRRw and mGAP measures) ● Segmentation may improve retrieval of full recordings (MRR measure) Similar Segments in Social Speech Task 2013
  • 15. 15 Baseline System ● We employ the Terrier IR toolkit ● Hiemstra language model ● Parameter set to 0.35 (importance of a query term in a document) ● Stopwords removal, stemming ● Post-filtering of the answers
  • 16. 16 Post-filtering Effect ● MAP, P5, P10 and P20 are notably higher in the experiments in which we did not remove partially overlapping segments ● These measures do not distinguish, whether a user had already seen the retrieved segment ● The overlapping segments are expected not to be so beneficial for the users Transcript Filtering MAP P5 P10 P20 MAP-bin MAP-tol Subtitles Yes 0.3692 0.7467 0.7133 0.6050 0.2606 0.2157 Subtitles No 16.3486 0.8400 0.8367 0.8433 0.3172 0.0515 Search and Hyperlinking Task 2014 (Search subtask)
  • 17. 17 Baseline System - Hyperlinking ● Transformed into Search subtask ● Query segment is transformed into a textual query by including all the words of the subtitles lying within the segment boundary ● Queries created on subtitles outperform ASR queries ● Even if we run the retrieval on the ASR transcripts
  • 18. 18 System Tuning ● Metadata ● Concatenate metadata with each segment ● Title, episode title, description, short episode synopsis, service name and program variant ● In Hyperlinking: Concatenate metadata with the query segment ● Context ● In Hyperlinking: use 200 seconds before the segment beginning and after the segment end
  • 19. Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol Subtitles None 0.4209 0.7933 0.7433 0.5950 0.3192 0.3155 Subtitles Metadata 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023 Transcript Tuning MAP P5 P10 P20 MAP-bin MAP-tol Subtitles None 0.1147 0.3071 0.2786 0.2036 0.1021 0.0792 Subtitles Metadata +Context 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237 19 System Tuning Cont. ● Search ● Hyperlinking Search and Hyperlinking Task 2014
  • 21. 21 Segmentation Types ● Fixed-length (Window-based) ● Segments of equilong length with regular shift ● Claimed to be a very effective approach ● Similarity-based ● Measure the similarity between neighboring segments (e.g. cosine distance) ● Algorithms TextTiling and C99 ● Lexical-chain-based ● A sequence of lexicographically related word occurrences ● Feature-based
  • 22. 22 Fixed-Length Segmentation Comparison ● S – Sentence ● Sh – Shot ● Sp – Speech Segment ● TP – Time + Pause ● TO – Time + Overlap (Fixed- Length Segment) M. Eskevich et al.: Multimedia information seeking through search and hyperlinking, ICMR 2013. Search and Hyperlinking Task 2012 (Search Subtask)
  • 23. 23 Fixed-length Segmentation Segment Length Search and Hyperlinking Task 2013 (Search subtask)
  • 24. 24 Fixed-length Segmentation Segment Shift Search and Hyperlinking Task 2013 (Search subtask)
  • 26. 26 Feature-based Segmentation ● We identify possible segment boundaries (beginnings and ends) ● J48 decision trees (almost equivalent to C4.5), Weka framework ● Training data available for the Similar Segments in Social Speech Task, MediaEval 2013 ● Manually marked segments ● Conversations between university students ● Binary classification problem ● For each word in the transcripts, we predict whether a segment boundary occurs after this word ● Classes: segment boundary and segment continuation
  • 27. 27 Used Features ● Cue words and tags ● N-grams which frequently appear at segment boundaries ● N-grams most informative for segment boundaries ● Manually defined n-grams ● Letter cases ● Length of the silence before the word ● Measured as a difference between timestamps of two adjacent words ● Division given in transcripts (e.g., speech segments defined in the LIMSI transcripts) ● The output of the TextTiling algorithm
  • 28. 28 Most Informative Features ● Division defined in the transcripts ● The length of silence ● Especially if it is longer than 300ms, 400ms, 500ms, 600ms) ● TextTiling algorithm output ● Segment beginnings: “if”, “I’m”, “especially”, “the”, “are you”, “you have”, “VBP PRP VBG”, … ● Segment ends: “good”, “interesting”, “lot”, …
  • 30. 30 Feature-based Segmentation Approaches Comparison Beg. End MRR MRRw mGAP #Seg Len [s] -- -- 0.656 0.052 0.027 2 k 2531.6 Reg Reg 0.671 0.388 0.245 234 k 49.5 ML -- 0.549 0.117 0.060 3125 k 2.3 -- ML 0.607 0.310 0.192 280 k 29.0 ML B+50 0.685 0.412 0.272 5820 k 49.6 E+50 ML 0.715 0.428 0.298 2580 k 49.6 ML ML 0.626 0.392 0.229 5659 k 20.2 Search and Hyperlinking Task 2013 (Search subtask), Results on the subtitles
  • 31. Feature-based Segmentation vs. Fixed-Length Segmentation 31 ● Search Task Transcript Segm. Seg. Len. MAP P5 P10 P20 MAP-bin ● Hyperlinking Task MAP-tol Subtitles Fixed 60s 0.5127 0.7467 0.7267 0.6100 0.3538 0.3023 Subtitles Featur es 50s 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 Transcript Segm. Seg. Search and Hyperlinking Task 2014 Len. MAP P5 P10 P20 MAP-bin MAP-tol Subtitles Fixed 60s 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 Subtitles Features 50s 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991
  • 32. 32 Visual Information in Segmentation ● Training data used for segmentation tuning are visually static visual information → would not be helpful ● Create segments only if visual similarity between adjacent segment < weight ● Tune the weight on the Search and Hyperlinking training data Transcript Segm. MAP P5 P10 P20 MAP-bin MAP-tol Subtitles Features 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350 Subtitles Features+ Visual 0.7701 0.7600 0.7500 0.6733 0.3285 0.2530 Search and Hyperlinking Task 2014 (Search Subtask)
  • 33. 33 Visual and Prosodic Similarity
  • 35. 35 Visual Similarity Cont. ● We use Feature Signatures and Signature Quadratic Form Distance http://siret.ms.mff.cuni.cz
  • 36. 36 Visual Similarity Results Transcript Meta. Weights MAP P5 P10 P20 MAP-bin MAP-tol Subtitles No None 0.1618 0.4786 0.4107 0.2893 0.1423 0.1216 Subtitles No Visual 0.1660 0.4929 0.4143 0.3000 0.1483 0.1245 Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 Subtitles Yes Visual 0.4366 0.8667 0.7700 0.5633 0.2724 0.2580 LIMSI Yes None 0.4166 0.8533 0.7133 0.5450 0.2659 0.2297 LIMSI Yes Visual 0.4168 0.8667 0.7333 0.5400 0.2692 0.2414 LIUM Yes None 0.4226 0.8333 0.7300 0.5433 0.2593 0.2547 LIUM Yes Visual 0.4212 0.8400 0.7367 0.5350 0.2622 0.2632 NST Yes None 0.4072 0.8067 0.7000 0.5417 0.2611 0.2237 NST Yes Visual 0.4160 0.8267 0.7167 0.5483 0.2655 0.2440 Search and Hyperlinking Task 2014 (Hyperlinking subtask)
  • 37. 37 Visual Similarity Results - Positive Query Examples
  • 38. 38 Visual Similarity Results - Negative Query Examples
  • 40. 40 Prosodic Similarity Results Transcript Meta. Weights MAP P5 P10 P20 MAP-bin MAP-tol Subtitles Yes None 0.4301 0.8600 0.7767 0.5483 0.2689 0.2465 Subtitles Yes Prosodic 0.4321 0.8533 0.7767 0.5517 0.2687 0.2473 ● Small but promising improvement Search and Hyperlinking Task 2014 (Hyperlinking subtask)
  • 45. 45 Conclusion ● Passage Retrieval ● Improves retrieval of relevant segments ● Can improve retrieval of full recordings ● Segmentation approach is crucial for the retrieval ● Fixed-length segmentation works well ● Feature-based segmentation outperforms fixed-length segmentation ● Visual and prosodic similarity can improve results of text-based retrieval
  • 46. 46 Thank you This research is supported by the Charles University Grant Agency (GA UK n. 920913)