SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
PISA
Production, Indexing and Search
of Audio-visual Material
 De wiskundige logica achter search en retrieval
          van audiovisueel materiaal
           Valérie De Witte, VRT-medialab
Archiving



                                                                           archiefnummer : ALG 20010813 1
                                                                           fragmentnummer : 1
                                                                           reeks      : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM               Set: 16 Aantal:        1                   bandnummer       : E03024404
blz 1 van 3                                                                formaat       : DBCM
 trefwoorden:     ibm and vrt                                              fragmenttitel : 1000 ZONNEN & GARNALEN
                                                                           beeld      : KL/PALPLUS
 archiefnummer:                                            -               fragmentduur    : 18 20
 uitzendjaar:                    maand:            dag:                    tekst     : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
 fragmentnummer:                       fragmentduur:                                 ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
 reeks:                                                                              OVERZICHT ONDERWERPEN
 formaat:                       bandnummer:                                          0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
 aflevering:                    afleveringsnummer:                                   OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
 programma:                        uitzenddatum:                                     GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
 fragmenttitel:                                                                      MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
 tekst:                                                                              ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
 kategorie:                                                                          BEPANTING, FOTOALBUM MET VERLOOP WERKEN
 opnamedatum:                       opnamenummer:                                    4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
 journalist:                    rechthebbende:                                       WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
                                                                                     RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
                                                                                     UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
            SETS                                                                     7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined                               INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
                                                                           trefwoorden    : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
                                                                                     CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
 F11      F12     F13   F14      F17      F18     F19          F20   Ent             SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken               PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
                                                                                     VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
                                                                                     LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
                                                                                     BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
                                                                           rechthebbende : VRT




                                                                                                                                                81
medialab
Issues




               -> “Annotation” provides structured metadata and
                  needs to become scalable for the increasing set
                  of information

               -> Automated processing of information is a key
                  issue, but it requires correct and structured
                  metadata

               -> Product Engineering is the source of structured
                  and meaningful information




                                                                    82
medialab
Alternative solution




medialab
Milestone 1 – Searching Audiovisual Material
    Assumptions:
    • A “scene” is the logical unit of search                              Search Client
                                                                       (Custom Development)

    The ideal search engine:
    • retrieves all relevant items (recall 100%)
    • without false positives (precision 100%)
    • provides grouping of similar results
    • gives instant access to digital media
    • with respect to intellectual property.




                     Legacy Video Library
                         (Basisplus)

                                            NewsML-G2

      Raw Material
    (EBU Superpop)                                         Media Asset                 Search Engine
                                                        Management System             (Lucene/SOLR)
                                                            (Ardome)



                     Actual news items
                         (Ardome)
                                                                                                       84
medialab
Milestone 2 – Computer Assisted Analysis
    !   Shot segmentation
    !   Audio classification
    !   Face detection
    !   Face recognition
    !   Scene detection
    !   Subtitle processing
    !   Topic recognition

                           Legacy Video Library
                               (Basisplus)

                                                   NewsML-G2

          Raw Material                                           Media Asset
        (EBU Superpop)                                           Management Asset
                                                                         Media                  Search Engine
                                                                      Management System        (Lucene/SOLR)
                                                                  (Ardome)(Ardome)


                         Actual news items
                             (Ardome)
                                                                            Face
                                                                          Detection
                                                     Shot                                    Topic
                                                  Segmentation                            Recognition

         Media                                                             Scene
                                                                                                                85
medialab
      Production                                                          Detection
Search systems

      Actual search implementations are excellent in terms of search capabilities
                - Boolean logic (AND-, OR- and NOT-operators)
                - truncation (plural, stemming, capital letters)
                - thesaurus (synonyms, homonyms,…)
                - structured metadata and range search
                - single word and phrase searching

      But… retrieval efficiency
                - coverage (composition of the used index, which parts of the documents
                  that are indexed, update frequency)
                - response time (average waiting time between issuing a search
                  command and displaying the first batch of results on the screen)
                - user effort (user-friendly interface)
                - output option (number of output options, layout, clarity)




                                                                                          86
medialab
Qualitative evaluation

      -> precision = l relevant documents ! retrieved documents l
                              l retrieved documents l

           - fraction of the returned results that are relevant

           - requires knowledge of the relevant and non-relevant hits in the
             set of retrieved documents




                                                                               87
medialab
Qualitative evaluation

      -> recall = l relevant documents ! retrieved documents l
                         l relevant documents l

           - fraction of the relevant documents in the collection that are
             retrieved

           - requires knowledge not only of the relevant and retrieved
             documents but also of those not retrieved




                                                                             88
medialab
Qualitative evaluation

      ! There is often an inverse relationship between precision and recall:
        increasing one will reduce the other

      ! Concerning recall and precision, one is more important than the other in
        different use cases

           -> in some use cases only the hits on the top of the list have to be
              relevant and there is not interest in looking at every document that is
              relevant (high precision)

           -> in some use cases we like to get the recall as high as possible and
               we will tolerate to see low precision results




                                                                                        89
medialab
Trouvaille

           Precision




                                Actual Search




                       Google




                                                Recall



medialab
Trouvaille

      ! Thesaurus application:
          ! During search: keywords in auto-completion, spellcheck and
             synonyms
      ! User friendly interface:
          ! Facetted search: programma, genre, journalist
          ! Different output views: keywords, thumbnails, Google-maps
      ! Use of a standard NewsML-G2
      ! Metadata is time-coded
          -> Matching keyframe




                                                                         91
medialab
Trouvaille: future work

                                                          ! Clustering: integration of copy detection to
   Precision                                                find duplicates in the retrieved hits
                                                          ! Intelligent Information Clustering:Concept
     100%
                                                            relationships detection
                                                          ! Feature extraction: Topic detection
                                                          ! Combination of system quality and user
                              Intelligent
                        Information clustering
                                                            satisfaction for the evaluation



                                             Trouvaille     Feature extraction
                                               (MS1)



                         Actual Search




               Google




                                                                                 100%
                                                                                        Recall

                                                                                                       92
medialab
Trouvaille




                   93
medialab

Contenu connexe

Similaire à PISA Production, Indexing and Search of Audio-visual Material

Fiat 20080921 results PISA
Fiat 20080921 results PISAFiat 20080921 results PISA
Fiat 20080921 results PISAvrt-medialab
 
Presentation of Scoop @Ebu Production Technology Seminar
Presentation of Scoop @Ebu Production Technology SeminarPresentation of Scoop @Ebu Production Technology Seminar
Presentation of Scoop @Ebu Production Technology SeminarMaarten Verwaest
 
Tape-less Workflow Applcation Architecture
Tape-less Workflow Applcation ArchitectureTape-less Workflow Applcation Architecture
Tape-less Workflow Applcation ArchitectureMaarten Verwaest
 
Metadata for video search: Trouvaille
Metadata for video search: TrouvailleMetadata for video search: Trouvaille
Metadata for video search: Trouvaillevrt-medialab
 

Similaire à PISA Production, Indexing and Search of Audio-visual Material (6)

Fiat 20080921 results PISA
Fiat 20080921 results PISAFiat 20080921 results PISA
Fiat 20080921 results PISA
 
Presentation of Scoop @Ebu Production Technology Seminar
Presentation of Scoop @Ebu Production Technology SeminarPresentation of Scoop @Ebu Production Technology Seminar
Presentation of Scoop @Ebu Production Technology Seminar
 
Digital Media Production
Digital Media ProductionDigital Media Production
Digital Media Production
 
Tape-less Workflow Applcation Architecture
Tape-less Workflow Applcation ArchitectureTape-less Workflow Applcation Architecture
Tape-less Workflow Applcation Architecture
 
Digital Media Production
Digital Media ProductionDigital Media Production
Digital Media Production
 
Metadata for video search: Trouvaille
Metadata for video search: TrouvailleMetadata for video search: Trouvaille
Metadata for video search: Trouvaille
 

Plus de vrt-medialab

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast mediumvrt-medialab
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaosvrt-medialab
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRMvrt-medialab
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquarevrt-medialab
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITvrt-medialab
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV appsvrt-medialab
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Webvrt-medialab
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMvrt-medialab
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_daysvrt-medialab
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011vrt-medialab
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introductionvrt-medialab
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technologyvrt-medialab
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiencesvrt-medialab
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediavrt-medialab
 

Plus de vrt-medialab (20)

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast medium
 
Champ iMinds
Champ iMindsChamp iMinds
Champ iMinds
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaos
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRM
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquare
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMIT
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV apps
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Web
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_days
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011
 
medialoep
medialoepmedialoep
medialoep
 
video for html5
video for html5video for html5
video for html5
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introduction
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technology
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiences
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite media
 
Transmedia
TransmediaTransmedia
Transmedia
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

PISA Production, Indexing and Search of Audio-visual Material

  • 1. PISA Production, Indexing and Search of Audio-visual Material De wiskundige logica achter search en retrieval van audiovisueel materiaal Valérie De Witte, VRT-medialab
  • 2. Archiving archiefnummer : ALG 20010813 1 fragmentnummer : 1 reeks : 1000 ZONNEN EN GARNALEN Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404 blz 1 van 3 formaat : DBCM trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN beeld : KL/PALPLUS archiefnummer: - fragmentduur : 18 20 uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE, reeks: OVERZICHT ONDERWERPEN formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER, kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN, RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM; Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL; VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT; LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING; BARBECUE; BETONMOLEN; IBM; RECLAMESPOT rechthebbende : VRT 81 medialab
  • 3. Issues -> “Annotation” provides structured metadata and needs to become scalable for the increasing set of information -> Automated processing of information is a key issue, but it requires correct and structured metadata -> Product Engineering is the source of structured and meaningful information 82 medialab
  • 5. Milestone 1 – Searching Audiovisual Material Assumptions: • A “scene” is the logical unit of search Search Client (Custom Development) The ideal search engine: • retrieves all relevant items (recall 100%) • without false positives (precision 100%) • provides grouping of similar results • gives instant access to digital media • with respect to intellectual property. Legacy Video Library (Basisplus) NewsML-G2 Raw Material (EBU Superpop) Media Asset Search Engine Management System (Lucene/SOLR) (Ardome) Actual news items (Ardome) 84 medialab
  • 6. Milestone 2 – Computer Assisted Analysis ! Shot segmentation ! Audio classification ! Face detection ! Face recognition ! Scene detection ! Subtitle processing ! Topic recognition Legacy Video Library (Basisplus) NewsML-G2 Raw Material Media Asset (EBU Superpop) Management Asset Media Search Engine Management System (Lucene/SOLR) (Ardome)(Ardome) Actual news items (Ardome) Face Detection Shot Topic Segmentation Recognition Media Scene 85 medialab Production Detection
  • 7. Search systems Actual search implementations are excellent in terms of search capabilities - Boolean logic (AND-, OR- and NOT-operators) - truncation (plural, stemming, capital letters) - thesaurus (synonyms, homonyms,…) - structured metadata and range search - single word and phrase searching But… retrieval efficiency - coverage (composition of the used index, which parts of the documents that are indexed, update frequency) - response time (average waiting time between issuing a search command and displaying the first batch of results on the screen) - user effort (user-friendly interface) - output option (number of output options, layout, clarity) 86 medialab
  • 8. Qualitative evaluation -> precision = l relevant documents ! retrieved documents l l retrieved documents l - fraction of the returned results that are relevant - requires knowledge of the relevant and non-relevant hits in the set of retrieved documents 87 medialab
  • 9. Qualitative evaluation -> recall = l relevant documents ! retrieved documents l l relevant documents l - fraction of the relevant documents in the collection that are retrieved - requires knowledge not only of the relevant and retrieved documents but also of those not retrieved 88 medialab
  • 10. Qualitative evaluation ! There is often an inverse relationship between precision and recall: increasing one will reduce the other ! Concerning recall and precision, one is more important than the other in different use cases -> in some use cases only the hits on the top of the list have to be relevant and there is not interest in looking at every document that is relevant (high precision) -> in some use cases we like to get the recall as high as possible and we will tolerate to see low precision results 89 medialab
  • 11. Trouvaille Precision Actual Search Google Recall medialab
  • 12. Trouvaille ! Thesaurus application: ! During search: keywords in auto-completion, spellcheck and synonyms ! User friendly interface: ! Facetted search: programma, genre, journalist ! Different output views: keywords, thumbnails, Google-maps ! Use of a standard NewsML-G2 ! Metadata is time-coded -> Matching keyframe 91 medialab
  • 13. Trouvaille: future work ! Clustering: integration of copy detection to Precision find duplicates in the retrieved hits ! Intelligent Information Clustering:Concept 100% relationships detection ! Feature extraction: Topic detection ! Combination of system quality and user Intelligent Information clustering satisfaction for the evaluation Trouvaille Feature extraction (MS1) Actual Search Google 100% Recall 92 medialab
  • 14. Trouvaille 93 medialab