This document summarizes several computer vision and audio analysis techniques demonstrated by the PISA group, including shot cut detection, scene segmentation, video reuse detection, face detection, and face recognition. The face detection method uses skin color segmentation and shape analysis to find face candidates with 92% accuracy. Face recognition is performed using a 3D morphable model fitted to 2D images.
A Journey Into the Emotions of Software Developers
PISA Production, Indexing and Search of Audio-visual Material
1. PISA
Production, Indexing and Search
of Audio-visual Material
Image Processing
Tinne Tuytelaars, IBBT – PSI – K.U.Leuven
2. Computer Assisted Analysis
! Intelligent analysis = reverse engineering
! Shot cut detection demo
! Scene segmentation demo
! Video reuse detection demo
! Face detection
! Face recognition demo
! Audio classification demo
61
3. Shot cut detection
= Split the video stream in atomic units, corresponding to a
continuously moving camera
! Distinguish between abrupt and smooth shotcuts
! Experimented with different methods
! Using color histograms
! Using affine motion compensation
! Using motion estimation within the compressed domain
62
5. Video reuse detection
The same video material is often reused
! can be detected automatically
! Robust to post-processing
! Efficiency
! Based on spatio-temporal local features and locality
sensitive hashing
69
6. Face Detection
• Face candidates selection:
quot; Candidate regions have skin color # region-based skin
segmentation
quot; Personalized chrominance skin boundary
• Verification based on cues:
quot; Shape of ellipse
quot; Ellipse-filling percentage
quot; Gray-tone smoothness
quot; Corners of the facial features
70
7. Face detection results
• 92% good face detections
quot;Comparable to state-of-the-art face learning of ‘Viola & Jones’ obtains 93%
quot;adaptation to lighting conditions and personal face looks
71
8. Face recognition
! Based on a 3D morphable model
! 3D model is fitted to 2D image
! Shape and texture parameters used as face descriptor
! Robust to
! viewpoint changes,
! illumination changes,
! partial occlusions.
72
14. Future work
! Include facial expressions in face recognition
! Multi-modal scene segmentation
! Feedback loop from Trouvaille
! Object or scene recognition
! Thesaurus-based speech recognition
79