The document discusses the development of an audiovisual search system called Trouvaille. It describes milestones for the system, including searching capabilities and computer-assisted analysis. Key aspects of the system are discussed such as using a thesaurus, faceted search, and time-coded metadata to improve search precision and recall for retrieving relevant audiovisual content.
PISA Production, Indexing and Search of Audio-visual Material
1. PISA
Production, Indexing and Search
of Audio-visual Material
De wiskundige logica achter search en retrieval
van audiovisueel materiaal
Valérie De Witte, VRT-medialab
2. Archiving
archiefnummer : ALG 20010813 1
fragmentnummer : 1
reeks : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404
blz 1 van 3 formaat : DBCM
trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN
beeld : KL/PALPLUS
archiefnummer: - fragmentduur : 18 20
uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
reeks: OVERZICHT ONDERWERPEN
formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN
opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
rechthebbende : VRT
81
medialab
3. Issues
-> “Annotation” provides structured metadata and
needs to become scalable for the increasing set
of information
-> Automated processing of information is a key
issue, but it requires correct and structured
metadata
-> Product Engineering is the source of structured
and meaningful information
82
medialab
5. Milestone 1 – Searching Audiovisual Material
Assumptions:
• A “scene” is the logical unit of search Search Client
(Custom Development)
The ideal search engine:
• retrieves all relevant items (recall 100%)
• without false positives (precision 100%)
• provides grouping of similar results
• gives instant access to digital media
• with respect to intellectual property.
Legacy Video Library
(Basisplus)
NewsML-G2
Raw Material
(EBU Superpop) Media Asset Search Engine
Management System (Lucene/SOLR)
(Ardome)
Actual news items
(Ardome)
84
medialab
6. Milestone 2 – Computer Assisted Analysis
! Shot segmentation
! Audio classification
! Face detection
! Face recognition
! Scene detection
! Subtitle processing
! Topic recognition
Legacy Video Library
(Basisplus)
NewsML-G2
Raw Material Media Asset
(EBU Superpop) Management Asset
Media Search Engine
Management System (Lucene/SOLR)
(Ardome)(Ardome)
Actual news items
(Ardome)
Face
Detection
Shot Topic
Segmentation Recognition
Media Scene
85
medialab
Production Detection
7. Search systems
Actual search implementations are excellent in terms of search capabilities
- Boolean logic (AND-, OR- and NOT-operators)
- truncation (plural, stemming, capital letters)
- thesaurus (synonyms, homonyms,…)
- structured metadata and range search
- single word and phrase searching
But… retrieval efficiency
- coverage (composition of the used index, which parts of the documents
that are indexed, update frequency)
- response time (average waiting time between issuing a search
command and displaying the first batch of results on the screen)
- user effort (user-friendly interface)
- output option (number of output options, layout, clarity)
86
medialab
8. Qualitative evaluation
-> precision = l relevant documents ! retrieved documents l
l retrieved documents l
- fraction of the returned results that are relevant
- requires knowledge of the relevant and non-relevant hits in the
set of retrieved documents
87
medialab
9. Qualitative evaluation
-> recall = l relevant documents ! retrieved documents l
l relevant documents l
- fraction of the relevant documents in the collection that are
retrieved
- requires knowledge not only of the relevant and retrieved
documents but also of those not retrieved
88
medialab
10. Qualitative evaluation
! There is often an inverse relationship between precision and recall:
increasing one will reduce the other
! Concerning recall and precision, one is more important than the other in
different use cases
-> in some use cases only the hits on the top of the list have to be
relevant and there is not interest in looking at every document that is
relevant (high precision)
-> in some use cases we like to get the recall as high as possible and
we will tolerate to see low precision results
89
medialab
11. Trouvaille
Precision
Actual Search
Google
Recall
medialab
12. Trouvaille
! Thesaurus application:
! During search: keywords in auto-completion, spellcheck and
synonyms
! User friendly interface:
! Facetted search: programma, genre, journalist
! Different output views: keywords, thumbnails, Google-maps
! Use of a standard NewsML-G2
! Metadata is time-coded
-> Matching keyframe
91
medialab
13. Trouvaille: future work
! Clustering: integration of copy detection to
Precision find duplicates in the retrieved hits
! Intelligent Information Clustering:Concept
100%
relationships detection
! Feature extraction: Topic detection
! Combination of system quality and user
Intelligent
Information clustering
satisfaction for the evaluation
Trouvaille Feature extraction
(MS1)
Actual Search
Google
100%
Recall
92
medialab