SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
PISA
Production, Indexing and Search
of Audio-visual Material
 De wiskundige logica achter search en retrieval
          van audiovisueel materiaal
           Valérie De Witte, VRT-medialab
Archiving



                                                                           archiefnummer : ALG 20010813 1
                                                                           fragmentnummer : 1
                                                                           reeks      : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM               Set: 16 Aantal:        1                   bandnummer       : E03024404
blz 1 van 3                                                                formaat       : DBCM
 trefwoorden:     ibm and vrt                                              fragmenttitel : 1000 ZONNEN & GARNALEN
                                                                           beeld      : KL/PALPLUS
 archiefnummer:                                            -               fragmentduur    : 18 20
 uitzendjaar:                    maand:            dag:                    tekst     : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
 fragmentnummer:                       fragmentduur:                                 ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
 reeks:                                                                              OVERZICHT ONDERWERPEN
 formaat:                       bandnummer:                                          0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
 aflevering:                    afleveringsnummer:                                   OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
 programma:                        uitzenddatum:                                     GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
 fragmenttitel:                                                                      MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
 tekst:                                                                              ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
 kategorie:                                                                          BEPANTING, FOTOALBUM MET VERLOOP WERKEN
 opnamedatum:                       opnamenummer:                                    4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
 journalist:                    rechthebbende:                                       WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
                                                                                     RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
                                                                                     UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
            SETS                                                                     7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined                               INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
                                                                           trefwoorden    : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
                                                                                     CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
 F11      F12     F13   F14      F17      F18     F19          F20   Ent             SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken               PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
                                                                                     VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
                                                                                     LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
                                                                                     BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
                                                                           rechthebbende : VRT




                                                                                                                                                81
medialab
Issues




               -> “Annotation” provides structured metadata and
                  needs to become scalable for the increasing set
                  of information

               -> Automated processing of information is a key
                  issue, but it requires correct and structured
                  metadata

               -> Product Engineering is the source of structured
                  and meaningful information




                                                                    82
medialab
Alternative solution




medialab
Milestone 1 – Searching Audiovisual Material
    Assumptions:
    • A “scene” is the logical unit of search                              Search Client
                                                                       (Custom Development)

    The ideal search engine:
    • retrieves all relevant items (recall 100%)
    • without false positives (precision 100%)
    • provides grouping of similar results
    • gives instant access to digital media
    • with respect to intellectual property.




                     Legacy Video Library
                         (Basisplus)

                                            NewsML-G2

      Raw Material
    (EBU Superpop)                                         Media Asset                 Search Engine
                                                        Management System             (Lucene/SOLR)
                                                            (Ardome)



                     Actual news items
                         (Ardome)
                                                                                                       84
medialab
Milestone 2 – Computer Assisted Analysis
    !   Shot segmentation
    !   Audio classification
    !   Face detection
    !   Face recognition
    !   Scene detection
    !   Subtitle processing
    !   Topic recognition

                           Legacy Video Library
                               (Basisplus)

                                                   NewsML-G2

          Raw Material                                           Media Asset
        (EBU Superpop)                                           Management Asset
                                                                         Media                  Search Engine
                                                                      Management System        (Lucene/SOLR)
                                                                  (Ardome)(Ardome)


                         Actual news items
                             (Ardome)
                                                                            Face
                                                                          Detection
                                                     Shot                                    Topic
                                                  Segmentation                            Recognition

         Media                                                             Scene
                                                                                                                85
medialab
      Production                                                          Detection
Search systems

      Actual search implementations are excellent in terms of search capabilities
                - Boolean logic (AND-, OR- and NOT-operators)
                - truncation (plural, stemming, capital letters)
                - thesaurus (synonyms, homonyms,…)
                - structured metadata and range search
                - single word and phrase searching

      But… retrieval efficiency
                - coverage (composition of the used index, which parts of the documents
                  that are indexed, update frequency)
                - response time (average waiting time between issuing a search
                  command and displaying the first batch of results on the screen)
                - user effort (user-friendly interface)
                - output option (number of output options, layout, clarity)




                                                                                          86
medialab
Qualitative evaluation

      -> precision = l relevant documents ! retrieved documents l
                              l retrieved documents l

           - fraction of the returned results that are relevant

           - requires knowledge of the relevant and non-relevant hits in the
             set of retrieved documents




                                                                               87
medialab
Qualitative evaluation

      -> recall = l relevant documents ! retrieved documents l
                         l relevant documents l

           - fraction of the relevant documents in the collection that are
             retrieved

           - requires knowledge not only of the relevant and retrieved
             documents but also of those not retrieved




                                                                             88
medialab
Qualitative evaluation

      ! There is often an inverse relationship between precision and recall:
        increasing one will reduce the other

      ! Concerning recall and precision, one is more important than the other in
        different use cases

           -> in some use cases only the hits on the top of the list have to be
              relevant and there is not interest in looking at every document that is
              relevant (high precision)

           -> in some use cases we like to get the recall as high as possible and
               we will tolerate to see low precision results




                                                                                        89
medialab
Trouvaille

           Precision




                                Actual Search




                       Google




                                                Recall



medialab
Trouvaille

      ! Thesaurus application:
          ! During search: keywords in auto-completion, spellcheck and
             synonyms
      ! User friendly interface:
          ! Facetted search: programma, genre, journalist
          ! Different output views: keywords, thumbnails, Google-maps
      ! Use of a standard NewsML-G2
      ! Metadata is time-coded
          -> Matching keyframe




                                                                         91
medialab
Trouvaille: future work

                                                          ! Clustering: integration of copy detection to
   Precision                                                find duplicates in the retrieved hits
                                                          ! Intelligent Information Clustering:Concept
     100%
                                                            relationships detection
                                                          ! Feature extraction: Topic detection
                                                          ! Combination of system quality and user
                              Intelligent
                        Information clustering
                                                            satisfaction for the evaluation



                                             Trouvaille     Feature extraction
                                               (MS1)



                         Actual Search




               Google




                                                                                 100%
                                                                                        Recall

                                                                                                       92
medialab
Trouvaille




                   93
medialab

Contenu connexe

Plus de vrt-medialab

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast mediumvrt-medialab
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaosvrt-medialab
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRMvrt-medialab
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquarevrt-medialab
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITvrt-medialab
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV appsvrt-medialab
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Webvrt-medialab
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMvrt-medialab
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_daysvrt-medialab
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011vrt-medialab
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introductionvrt-medialab
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technologyvrt-medialab
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiencesvrt-medialab
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediavrt-medialab
 

Plus de vrt-medialab (20)

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast medium
 
Champ iMinds
Champ iMindsChamp iMinds
Champ iMinds
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaos
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRM
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquare
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMIT
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV apps
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Web
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_days
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011
 
medialoep
medialoepmedialoep
medialoep
 
video for html5
video for html5video for html5
video for html5
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introduction
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technology
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiences
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite media
 
Transmedia
TransmediaTransmedia
Transmedia
 

Dernier

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 

Dernier (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 

PISA Production, Indexing and Search of Audio-visual Material

  • 1. PISA Production, Indexing and Search of Audio-visual Material De wiskundige logica achter search en retrieval van audiovisueel materiaal Valérie De Witte, VRT-medialab
  • 2. Archiving archiefnummer : ALG 20010813 1 fragmentnummer : 1 reeks : 1000 ZONNEN EN GARNALEN Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404 blz 1 van 3 formaat : DBCM trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN beeld : KL/PALPLUS archiefnummer: - fragmentduur : 18 20 uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE, reeks: OVERZICHT ONDERWERPEN formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER, kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN, RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM; Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL; VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT; LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING; BARBECUE; BETONMOLEN; IBM; RECLAMESPOT rechthebbende : VRT 81 medialab
  • 3. Issues -> “Annotation” provides structured metadata and needs to become scalable for the increasing set of information -> Automated processing of information is a key issue, but it requires correct and structured metadata -> Product Engineering is the source of structured and meaningful information 82 medialab
  • 5. Milestone 1 – Searching Audiovisual Material Assumptions: • A “scene” is the logical unit of search Search Client (Custom Development) The ideal search engine: • retrieves all relevant items (recall 100%) • without false positives (precision 100%) • provides grouping of similar results • gives instant access to digital media • with respect to intellectual property. Legacy Video Library (Basisplus) NewsML-G2 Raw Material (EBU Superpop) Media Asset Search Engine Management System (Lucene/SOLR) (Ardome) Actual news items (Ardome) 84 medialab
  • 6. Milestone 2 – Computer Assisted Analysis ! Shot segmentation ! Audio classification ! Face detection ! Face recognition ! Scene detection ! Subtitle processing ! Topic recognition Legacy Video Library (Basisplus) NewsML-G2 Raw Material Media Asset (EBU Superpop) Management Asset Media Search Engine Management System (Lucene/SOLR) (Ardome)(Ardome) Actual news items (Ardome) Face Detection Shot Topic Segmentation Recognition Media Scene 85 medialab Production Detection
  • 7. Search systems Actual search implementations are excellent in terms of search capabilities - Boolean logic (AND-, OR- and NOT-operators) - truncation (plural, stemming, capital letters) - thesaurus (synonyms, homonyms,…) - structured metadata and range search - single word and phrase searching But… retrieval efficiency - coverage (composition of the used index, which parts of the documents that are indexed, update frequency) - response time (average waiting time between issuing a search command and displaying the first batch of results on the screen) - user effort (user-friendly interface) - output option (number of output options, layout, clarity) 86 medialab
  • 8. Qualitative evaluation -> precision = l relevant documents ! retrieved documents l l retrieved documents l - fraction of the returned results that are relevant - requires knowledge of the relevant and non-relevant hits in the set of retrieved documents 87 medialab
  • 9. Qualitative evaluation -> recall = l relevant documents ! retrieved documents l l relevant documents l - fraction of the relevant documents in the collection that are retrieved - requires knowledge not only of the relevant and retrieved documents but also of those not retrieved 88 medialab
  • 10. Qualitative evaluation ! There is often an inverse relationship between precision and recall: increasing one will reduce the other ! Concerning recall and precision, one is more important than the other in different use cases -> in some use cases only the hits on the top of the list have to be relevant and there is not interest in looking at every document that is relevant (high precision) -> in some use cases we like to get the recall as high as possible and we will tolerate to see low precision results 89 medialab
  • 11. Trouvaille Precision Actual Search Google Recall medialab
  • 12. Trouvaille ! Thesaurus application: ! During search: keywords in auto-completion, spellcheck and synonyms ! User friendly interface: ! Facetted search: programma, genre, journalist ! Different output views: keywords, thumbnails, Google-maps ! Use of a standard NewsML-G2 ! Metadata is time-coded -> Matching keyframe 91 medialab
  • 13. Trouvaille: future work ! Clustering: integration of copy detection to Precision find duplicates in the retrieved hits ! Intelligent Information Clustering:Concept 100% relationships detection ! Feature extraction: Topic detection ! Combination of system quality and user Intelligent Information clustering satisfaction for the evaluation Trouvaille Feature extraction (MS1) Actual Search Google 100% Recall 92 medialab
  • 14. Trouvaille 93 medialab