SlideShare une entreprise Scribd logo
1  sur  19
Television Linked To The Web

LinkedTV @ MediaEval
Search and Hyperlinking
M. Sahuguet1, B. Huet1, B. Cervenková2, E. Apostolidis4, V. Mezaris4, D. Stein3,
S. Eickeler3, J.L. Redondo Garcia1, R. Troncy1, and L. Pikora2
MediaEval 2013 Workshop
Barcelona, Catalunya, Spain, 18-19 October 2013.
(1)

(2)

www.linkedtv.eu

(3)

(4)
LinkedTV ― Television Linked To the Web
www.linkedtv.eu

LinkedTV: interweaving Web and
TV into a single experience
Second screen scenario for
enriching television content and
achieving interaction between
user and content

Web: http://www.linkedtv.eu
2

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
LinkedTV@MediaEval
www.linkedtv.eu

 MediaEval Search & Hyperlinking:
an overview of LinkedTV’s enrichment process









Brainstorming
Pre-processing (BBC dataset)
Video segmentation
Indexing data in Lucene
From visual cues to detected concepts
Search task
Hyperlinking task
Conclusion

3

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Brainstorming
www.linkedtv.eu

 Brainstorming meeting: Tasks and Dataset analysis

Shots are too small to return to user
Typos in the queries
Duplicate videos in the dataset
Visual concepts are not usable as such
Visual cues may not be helpful
Visual cues can also help as search terms
Maybe we can segment the videos differently?
Can we use speaker information?
Name of show/channel may appear in the query
Actors/Character names may appear
What analysis can we further apply on videos?

4

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Brainstorming
www.linkedtv.eu

 Brainstorming meeting: Tasks and Dataset analysis
 Search:



Getting the right video is possible
Need to extract segment with good timing

 Segmentation level is of major importance


Shot are too short



We want to be as close as possible to the viewer

 Visual cues: not always helpful
<visualQueues>2 men sitting opposite each other</visualQueues>
<visualQueues>stands out and grabs your attention</visualQueues>

 Need to design a framework to use Visual Cues

 How can the LinkedTV media analysis tools be used?

5

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Pre-processing dataset
www.linkedtv.eu

 Processing ~ 1697h of BBC video data

Visual Concept detection (151)

20 days on 100 cores

Scene segmentation

CERTH

2 days on 6 cores

OCR

Fraunhofer

1 day on 10 cores

Keywords extraction

Fraunhofer

5 hours

Named Entities extraction

Eurecom

4 days

Face detection and tracking

6

CERTH

Eurecom

4 days on 160 cores

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Video Segmentation
www.linkedtv.eu

 Shots (provided by Task Organisers)
 Scenes: groups of adjacent shots




Visual similarity
Temporal consistency
P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I.
Trancoso. Temporal Video Segmentation to Scenes Using High-Level
Audiovisual Features. IEEE Transactions on Circuits and Systems for Video
Technology, 2011

 Sliding windows:


7

inspired from M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T.
Verschoor, and R. Ordelman. Comparing retrieval effectiveness of
alternative content segmentation methods for Internet video search. 10th
International Workshop on Content-Based Multimedia Indexing (CBMI), 2012

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Indexing data in Lucene
www.linkedtv.eu

 Lucene engine for indexing the data
 Index at different temporal granularities:


Video level (pre-filtering)



Scenes level



Shot level



Sliding windows segments level

 Index different features at each temporal granularity:


Text (transcripts, subtitles)



Metadata (title, synopsis, cast, etc)



OCR



Visual concepts values (floating point fields)

 Design a framework for querying indexes and returning video segments
from a query
8

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
From visual cues to detected concepts
www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search

9

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
From visual cues to detected concepts
www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search
 Which concepts are present in the query?
 semantic word distance based on Wordnet synset
 mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal

10

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
From visual cues to detected concepts
www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search
 Which concepts are present in the query?
 semantic word distance based on Wordnet synset
 mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal

 Integration of detected visual concepts to the Lucene search:
 Concepts filtering

11

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
From visual cues to detected concepts
www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search
 Which concepts are present in the query?
 semantic word distance based on Wordnet synset
 mapping between keywords (extracted first results:
- Correct detection rate from the 100 from the visual cues query)
and visual concepts 0,5
- threshold at
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
- Normalize confidence: threshold at 0,7
mapped visual concepts: Athlete, Dogs, Horse, Animal

 Integration of detected visual concepts to the Lucene search:
 Concepts filtering

12

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
From visual cues to detected concepts
www.linkedtv.eu

 Text search is straightforward (default, TF-IDF values)
 Need to incorporate visual information to the search
 Which concepts are present in the query?
 semantic word distance based on Wordnet synset
 mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal

 Integration of detected visual concepts to the Lucene search:
 Concepts Selection
 Designing an enriched query: both textual (text query) and visual
information (range query).

13

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Search task
www.linkedtv.eu

 Search videos at different temporal granularity
 Concatenation of textual and visual query for text search


<queryText>Odd cars, Fake MacLaren, </queryText>



<visualQueues>Jeremy Clarkson, Richard Hammond, James May, Ferrari 430
Scuderia</visualQueues>

 Visual cues can be found in queryText too

 If TV Channel is mentioned, perform filtering:


<visualQueues>Cannabis on BBC ONE</visualQueues>



Should also be done on show titles (for next year?)

 For some runs, filter at video level first


Making a text query on the video index



Use 20 first video for segment search

 Focused search
14

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Search task
www.linkedtv.eu

 Different granularities:





scenes
partial scenes (begin at shot ; ends at the corresponding scene ending)
temporally clustered shots (inside a video)
sliding window

 Different textual data (transcript/ASR)
 With/Without Visual Concepts
 With/Without use of synonyms
 9 runs
 goal : comparing approaches and features

15

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Search task – Results
www.linkedtv.eu

MASP

scenes-C

0.3095

0.1770

0.1951

0.3091

0.1767

0.1947

0.3152

0.1635

0.2021

scenes-I

0.2613

0.1444

0.1582

scenes-U

0.2458

0.1344

0.1528

0.2284

0.1241

0.1024

part-scenes-noC

0.2281

0.1240

0.1021

clustering-C

0.2929

0.1525

0.1814

clustering-noC

0.2849

0.1479

0.1713

SW-60-S

0.2833

0.1925

0.2027

SW-60-I

0.1965

0.1206

0.1204

SW-40-U

16

mGAP

part-scenes-C

Search over
sliding window
segments (size
60)

MRR

scenes-S
Scene search
using only
subtitles

Run
scenes-noC

Scenes search
using textual and
visual concepts

0.2368

0.1342

0.1501

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Hyperlinking Task
www.linkedtv.eu

 Re-use of the search component



Shot clustering approach
Scene approach

 Create a query from the anchor!




Get subtitle and shots aligned with anchor
Text query: extract keywords using Alchemy API (highest weight to anchor
than context)
Visual cues query: for each concept, highest score over all shots

 Use of “MoreLikeThis” (MLT) feature in Lucene, combined with THD


sliding window approach

 Create temporary documents from the anchor!



17

THD = Targeted Hypernym Discovery (UEP): returns semantic
annotation, synonyms
MLT: finding similar documents as input

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Hyperlinking results
www.linkedtv.eu

Run

18

P-10

P-20

0.0577

0.4467

0.3200

0.2067

LA SW MLT

0.1201

0.4200

0.4200

0.3217

LA scenes

0.1770

0.6867

0.5867

0.4167

LC clustering 0.0823

Scenes search in
LC condition
(anchor + context)

P-5

LA clustering
Scenes search in
LA condition
(anchor only)

MAP

0.5733

0.4833

0.2767

LC SW MLT

0.1820

0.5667

0.5667

0.4300

LC scenes

0.2523

0.8133

0.7300

0.5283

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013
Conclusions
www.linkedtv.eu

 Major findings
 Scene segmentation approach performs best
 Improvement when using visual concepts
 when carefully employed

 Future work
 Improve scene detection
 Closer follow human perception
 Improve the link between query and visual concepts
 Use named entities

Thank you
Questions?
19

LinkedTV @ MediaEval Search and Hyperlinking 2013

10/18/2013

Contenu connexe

Similaire à LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrieval
eSAT Journals
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answering
vembuking
 

Similaire à LinkedTV @ MediaEval 2013 Search and Hyperlinking Task (20)

Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrieval
 
Sirio
SirioSirio
Sirio
 
Hyper Video Browser Search and Hyperlinking in Broadcast Media
Hyper Video Browser Search and Hyperlinking in Broadcast MediaHyper Video Browser Search and Hyperlinking in Broadcast Media
Hyper Video Browser Search and Hyperlinking in Broadcast Media
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answering
 
Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708Summer school bz_fp7research_20100708
Summer school bz_fp7research_20100708
 
Annual Project Scientific Report
Annual Project Scientific ReportAnnual Project Scientific Report
Annual Project Scientific Report
 
European Research Projects as EOSC Service Providers
European Research Projects as EOSC Service ProvidersEuropean Research Projects as EOSC Service Providers
European Research Projects as EOSC Service Providers
 
Semantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivitySemantic Web in the Plateau of Productivity
Semantic Web in the Plateau of Productivity
 
Video Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive StreamingVideo Coding Enhancements for HTTP Adaptive Streaming
Video Coding Enhancements for HTTP Adaptive Streaming
 
Research@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdfResearch@Lunch_Presentation.pdf
Research@Lunch_Presentation.pdf
 
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
Gergely Sipos, Claudio Cacciari: Welcome and mapping the landscape: EOSC-hub ...
 
SFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portalSFScon21 - Simone Tritini - The Environmental Data Platform web portal
SFScon21 - Simone Tritini - The Environmental Data Platform web portal
 
Shibboleth Federations and Secure SDI
Shibboleth Federations and Secure SDIShibboleth Federations and Secure SDI
Shibboleth Federations and Secure SDI
 
OGC Web Service Shibboleth Interoperability Experiment
OGC Web Service Shibboleth Interoperability ExperimentOGC Web Service Shibboleth Interoperability Experiment
OGC Web Service Shibboleth Interoperability Experiment
 
CV _Manoj
CV _ManojCV _Manoj
CV _Manoj
 
OGC Interoperability Experiments and Authentication
OGC Interoperability Experiments and AuthenticationOGC Interoperability Experiments and Authentication
OGC Interoperability Experiments and Authentication
 
Resource Discovery Landscape
Resource Discovery LandscapeResource Discovery Landscape
Resource Discovery Landscape
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Newsletter 2013
Newsletter 2013Newsletter 2013
Newsletter 2013
 
Governance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunitiesGovernance and Sustainability of EOSC: ambitions, challenges and opportunities
Governance and Sustainability of EOSC: ambitions, challenges and opportunities
 

Plus de Benoit HUET

Event-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question AnsweringEvent-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question Answering
Benoit HUET
 
Multimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to ContentMultimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to Content
Benoit HUET
 

Plus de Benoit HUET (9)

Affective Multimodal Analysis for the Media Industry
Affective Multimodal Analysis for the Media IndustryAffective Multimodal Analysis for the Media Industry
Affective Multimodal Analysis for the Media Industry
 
NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
 
Media Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media InterestingnessMedia Genre Inference for Predicting Media Interestingness
Media Genre Inference for Predicting Media Interestingness
 
Event-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question AnsweringEvent-based MultiMedia Search and Retrieval for Question Answering
Event-based MultiMedia Search and Retrieval for Question Answering
 
Convenient Discovery of Archived Video Using Audiovisual Hyperlinking
Convenient Discovery of Archived Video Using Audiovisual HyperlinkingConvenient Discovery of Archived Video Using Audiovisual Hyperlinking
Convenient Discovery of Archived Video Using Audiovisual Hyperlinking
 
Multimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to ContentMultimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to Content
 
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
 
Multimedia Data Collection using Social Media Analysis
Multimedia Data Collection using Social Media Analysis Multimedia Data Collection using Social Media Analysis
Multimedia Data Collection using Social Media Analysis
 
Wsm2011
Wsm2011Wsm2011
Wsm2011
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

LinkedTV @ MediaEval 2013 Search and Hyperlinking Task

  • 1. Television Linked To The Web LinkedTV @ MediaEval Search and Hyperlinking M. Sahuguet1, B. Huet1, B. Cervenková2, E. Apostolidis4, V. Mezaris4, D. Stein3, S. Eickeler3, J.L. Redondo Garcia1, R. Troncy1, and L. Pikora2 MediaEval 2013 Workshop Barcelona, Catalunya, Spain, 18-19 October 2013. (1) (2) www.linkedtv.eu (3) (4)
  • 2. LinkedTV ― Television Linked To the Web www.linkedtv.eu LinkedTV: interweaving Web and TV into a single experience Second screen scenario for enriching television content and achieving interaction between user and content Web: http://www.linkedtv.eu 2 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 3. LinkedTV@MediaEval www.linkedtv.eu  MediaEval Search & Hyperlinking: an overview of LinkedTV’s enrichment process         Brainstorming Pre-processing (BBC dataset) Video segmentation Indexing data in Lucene From visual cues to detected concepts Search task Hyperlinking task Conclusion 3 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 4. Brainstorming www.linkedtv.eu  Brainstorming meeting: Tasks and Dataset analysis Shots are too small to return to user Typos in the queries Duplicate videos in the dataset Visual concepts are not usable as such Visual cues may not be helpful Visual cues can also help as search terms Maybe we can segment the videos differently? Can we use speaker information? Name of show/channel may appear in the query Actors/Character names may appear What analysis can we further apply on videos? 4 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 5. Brainstorming www.linkedtv.eu  Brainstorming meeting: Tasks and Dataset analysis  Search:   Getting the right video is possible Need to extract segment with good timing  Segmentation level is of major importance  Shot are too short  We want to be as close as possible to the viewer  Visual cues: not always helpful <visualQueues>2 men sitting opposite each other</visualQueues> <visualQueues>stands out and grabs your attention</visualQueues>  Need to design a framework to use Visual Cues  How can the LinkedTV media analysis tools be used? 5 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 6. Pre-processing dataset www.linkedtv.eu  Processing ~ 1697h of BBC video data Visual Concept detection (151) 20 days on 100 cores Scene segmentation CERTH 2 days on 6 cores OCR Fraunhofer 1 day on 10 cores Keywords extraction Fraunhofer 5 hours Named Entities extraction Eurecom 4 days Face detection and tracking 6 CERTH Eurecom 4 days on 160 cores LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 7. Video Segmentation www.linkedtv.eu  Shots (provided by Task Organisers)  Scenes: groups of adjacent shots    Visual similarity Temporal consistency P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features. IEEE Transactions on Circuits and Systems for Video Technology, 2011  Sliding windows:  7 inspired from M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T. Verschoor, and R. Ordelman. Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search. 10th International Workshop on Content-Based Multimedia Indexing (CBMI), 2012 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 8. Indexing data in Lucene www.linkedtv.eu  Lucene engine for indexing the data  Index at different temporal granularities:  Video level (pre-filtering)  Scenes level  Shot level  Sliding windows segments level  Index different features at each temporal granularity:  Text (transcripts, subtitles)  Metadata (title, synopsis, cast, etc)  OCR  Visual concepts values (floating point fields)  Design a framework for querying indexes and returning video segments from a query 8 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 9. From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search 9 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 10. From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal 10 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 11. From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts filtering 11 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 12. From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted first results: - Correct detection rate from the 100 from the visual cues query) and visual concepts 0,5 - threshold at <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> - Normalize confidence: threshold at 0,7 mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts filtering 12 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 13. From visual cues to detected concepts www.linkedtv.eu  Text search is straightforward (default, TF-IDF values)  Need to incorporate visual information to the search  Which concepts are present in the query?  semantic word distance based on Wordnet synset  mapping between keywords (extracted from the visual cues query) and visual concepts <visualQueues>animals, kenya wildlife reserve, marathon</visualQueues> mapped visual concepts: Athlete, Dogs, Horse, Animal  Integration of detected visual concepts to the Lucene search:  Concepts Selection  Designing an enriched query: both textual (text query) and visual information (range query). 13 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 14. Search task www.linkedtv.eu  Search videos at different temporal granularity  Concatenation of textual and visual query for text search  <queryText>Odd cars, Fake MacLaren, </queryText>  <visualQueues>Jeremy Clarkson, Richard Hammond, James May, Ferrari 430 Scuderia</visualQueues>  Visual cues can be found in queryText too  If TV Channel is mentioned, perform filtering:  <visualQueues>Cannabis on BBC ONE</visualQueues>  Should also be done on show titles (for next year?)  For some runs, filter at video level first  Making a text query on the video index  Use 20 first video for segment search  Focused search 14 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 15. Search task www.linkedtv.eu  Different granularities:     scenes partial scenes (begin at shot ; ends at the corresponding scene ending) temporally clustered shots (inside a video) sliding window  Different textual data (transcript/ASR)  With/Without Visual Concepts  With/Without use of synonyms  9 runs  goal : comparing approaches and features 15 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 16. Search task – Results www.linkedtv.eu MASP scenes-C 0.3095 0.1770 0.1951 0.3091 0.1767 0.1947 0.3152 0.1635 0.2021 scenes-I 0.2613 0.1444 0.1582 scenes-U 0.2458 0.1344 0.1528 0.2284 0.1241 0.1024 part-scenes-noC 0.2281 0.1240 0.1021 clustering-C 0.2929 0.1525 0.1814 clustering-noC 0.2849 0.1479 0.1713 SW-60-S 0.2833 0.1925 0.2027 SW-60-I 0.1965 0.1206 0.1204 SW-40-U 16 mGAP part-scenes-C Search over sliding window segments (size 60) MRR scenes-S Scene search using only subtitles Run scenes-noC Scenes search using textual and visual concepts 0.2368 0.1342 0.1501 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 17. Hyperlinking Task www.linkedtv.eu  Re-use of the search component   Shot clustering approach Scene approach  Create a query from the anchor!    Get subtitle and shots aligned with anchor Text query: extract keywords using Alchemy API (highest weight to anchor than context) Visual cues query: for each concept, highest score over all shots  Use of “MoreLikeThis” (MLT) feature in Lucene, combined with THD  sliding window approach  Create temporary documents from the anchor!   17 THD = Targeted Hypernym Discovery (UEP): returns semantic annotation, synonyms MLT: finding similar documents as input LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 18. Hyperlinking results www.linkedtv.eu Run 18 P-10 P-20 0.0577 0.4467 0.3200 0.2067 LA SW MLT 0.1201 0.4200 0.4200 0.3217 LA scenes 0.1770 0.6867 0.5867 0.4167 LC clustering 0.0823 Scenes search in LC condition (anchor + context) P-5 LA clustering Scenes search in LA condition (anchor only) MAP 0.5733 0.4833 0.2767 LC SW MLT 0.1820 0.5667 0.5667 0.4300 LC scenes 0.2523 0.8133 0.7300 0.5283 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013
  • 19. Conclusions www.linkedtv.eu  Major findings  Scene segmentation approach performs best  Improvement when using visual concepts  when carefully employed  Future work  Improve scene detection  Closer follow human perception  Improve the link between query and visual concepts  Use named entities Thank you Questions? 19 LinkedTV @ MediaEval Search and Hyperlinking 2013 10/18/2013

Notes de l'éditeur

  1. Input from Daniel regarding the progress in Audio Analysis and VideoOCR*** adoption of new video OCR*** speech processing - preparation of new paradigms: **** deep neural networks (automatic speech recognition) **** i-vectors + SVMs using cosine kernel (speaker recognition)