In the research project PISA we have investigated how powerful search engines can be build, given a library of audiovisual material that has been analysed objectively and intelligently
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
PISA - Proof of Concept
1. medialab
PISA – Proof of Concept
Production, Indexing and Search of Audiovisual Material
2. PISA - Positioning
PISA – Production and Indexing of Audiovisual Media
! 30 Man-year
! Virtual Modelling
! Computer Assisted Manufacturing
! Unsupervised Feature Extraction
! Search Engine Technology
2
3. Context - Digital Media Production
Suprastructure – Metadata Mgnt
Production and distribution
Production and distribution
Editing Mastering
Media
Ingest Asset Mgnt Playout
Infrastructure - Networks and Storage
Production Platform
3
4. Digital Asset Management, Content Management…
Suprastructure – Metadata Mgnt
Production and distribution
Infrastructure - Networks and Storage
Production Platform
4
5. User Expectations
Communication
(Information)
Data General Data General Data General
Suprastructure – Metadata Mgnt
Data General Data General Data General
Meta Meta
Data Data
Production and distribution
Media Production
• Mass-production
• Anywhere, anytime, on any device
• Personalisation
Infrastructure - Networks and Storage
The ideal search engine
• retrieves all relevant items (recall 100%)
• without false positives (precision 100%)
• enables instant access to digital media
• with respect to intellectual property.
Production Platform
5
6. Archiving – Disclosure, Annotation,…
archiefnummer : ALG 20010813 1
fragmentnummer : 1
reeks : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404
blz 1 van 3 formaat : DBCM
trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN
beeld : KL/PALPLUS
archiefnummer: - fragmentduur : 18 20
uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
reeks: OVERZICHT ONDERWERPEN
formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN
opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
rechthebbende : VRT
6
8. Web 2.0 – « User Generated Content », « Social Tagging »?
8
9. Catch-22
-> “Annotation” is a subjective interpretation, and
thus it is not scalable
-> Automated processing of information is a key
discriminator, but it requires correct and
structured metadata
-> Product Engineering is the source of structured
and meaningful information, but creative staff
are not susceptible to technology
9
10. Objectives - Proof of Concept
• One Set of Numbers(!)
• Model Driven Development
• Computer Assisted Manufacturing
• Unsupervised Feature Extraction
• Efficient Search and Retrieval
!
Develop an extensible data-model and a consistent application
framework, accessible via an intuitive user-interface
(! Digitizing analogue and disintegrated information flows)
10
11. PISA - Overview
Computer Assisted Design Search Engine
Concept
Indexing
Script Editing
Retrieval
• Parse scenario Script Editing • Timecode based indexing
• Shooting script editor
• Geo-temporal reference
• Storyboard
• Taxonomy based indexing and search
Abstract • Facetted search
Information
Virtual Intelligent Analysis and
Model Driven Development: Modelling Analysis Quantization
• Setting (Stage properties, light) Interpretation
• Character • Character identification
• Synthetic Speech • Background categorisation
• Sound effects Virtual and identification
• Character animation Model • Topic and eventdetection
• Virtual camera
Quantization
Computer Assisted Manufacturing
Automated Reverse Engineering
Realisation Production • Shot segmentation
• Ingest Footage • Video footprint and reuse detection
• Editing • Biometric face detection
• Mastering • Background analysis
• Reproduction to alternative distribution channels • Speech-to-text
11
13. The Search Engine
! Search federation by system integration Search Client
! Facetted search (Custom Development)
! Integrated application of keywords
! Intuitive and structured presentation of results
! Random access to audiovisual material
Legacy Video Library
(Basisplus)
<NewsML-G2>
Raw Material
(EBU Superpop) Media Asset Search Engine
Management System (Lucene/SOLR)
(Ardome)
Actual news items
(Ardome)
13
16. Intelligent Analysis
Unsupervised feature extraction provides time-
coded attributes:
! Shot segmentation and keyframe extraction
! Audio segmentation and speaker recognition
! Subtitle processing and speech recognition
! Taxonomy-driven topic detection
! Face recognition
! Scene recognition
! Copy detection
Legacy Video Library
(Basisplus)
<NewsML-G2>
Raw Material Media Asset
(EBU Superpop) Management Asset
Media Search Engine
Management System (Lucene/SOLR)
(Ardome)(Ardome)
Actual news items
(Ardome)
Face
Detection
Shot Topic
Segmentation Detection
Media Speech
16
Production Recognition
17. Conclusion
! Enterprise search – structured metadata, limited number of libraries, limited number
of records per library, dependencies between objects
! Intelligent search federation is aware of the media production process - scripts,
webpages, subtitles and formal annotation may represent the same editorial object
! Random access to audiovisual material requires an index is based on timecode and
not « wordposition in a document »
! Onthology-driven application logic is essential to enable semantic awareness, i.e.
resolving synonyms and disambiguation of homonyms
! The perfect search engine is not for sale yet and required from the ground up design
and development.
17