A Higher-Level Visual Representation For Semantic Learning In ImageDatabases

A Higher-Level Visual Representation For Semantic Learning In ImageDatabasesIsmail EL SAYAD18/07/2011

Introduction Related works Our approach Experiments Conclusion and perspectives Overview Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 2

Introduction Related works Our approach Experiments Conclusion and perspectives Motivation Digital content grows rapidly Personal acquisition devices Broadcast TV Surveillance Relatively easy to store, but useless if no automatic processing, classification, and retrieving The usual way to solve this problem is by describing images by keywords. This method suffers from subjectivity, text ambiguity and the lack of automatic annotation 3

Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Image-based representations are based on global visual features extracted over the whole image like color, color moment, shape or texture Visual representations Image-based representations Part-based representations 4

Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations The main drawbacks of Image-based representations: High sensitivity to : Scale Pose Lighting condition changes Occlusions Cannot capture the local information of an image Part-based representations: Based on the statistics of features extracted from segmented image regions 5

Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsPart-based representations (Bag of visual words) Feature space Visual word vocabulary VW1 VW2 VW3 VW4 . . . Compute local descriptors Feature clustering VW1 VW2 VW3 VW1 VW2 VW3 VW4 . . . 2 1 1 1 . . . VW4 Frequency VW1 6

Introduction Related works Our approach Experiments Conclusion and perspectives Visual representations Bag of visual words (BOW) drawbacks ,[object Object]

Ignore the positionUsing only keypoints-based Intensity descriptors: Neither shape nor color information is used Feature quantization noisiness: Unnecessary and insigniﬁcant visual words are generated 7

Introduction Related works Our approach Experiments Conclusion and perspectives Visual representationsDrawbacks Bag of Visual words (BOW) ,[object Object],Different image semantics are represented by the same visual words Low invariance for visual diversity: One image semantic is represented by different visual words VW1364 VW1364 VW330 VW480 VW263 VW148 8

Introduction Related works Our approach Experiments Conclusion and perspectives Objectives Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure Higher-level visual representation Less noisy More discriminative More invariant to the visual diversity 9

MSSA model Introduction Related works Our approach Experiments Conclusion and perspectives E-BOW Overview of the proposed higher-level visual representation SSVIWs & SSIVPs generation E-BOW representation SSIVG representation SSIVG Learning the MSSA model Visual word vocabulary building Set of images 10

Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Spatial Pyramid Matching Kernel (SPM) & sparse coding Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset Our approach Experiments Conclusion and perspectives 11

Introduction Related works Our approach Experiments Conclusion and perspectives Spatial Pyramid Matching Kernel (SPM) & sparse coding Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the spatial information of location regions. Yang et al. [CVPR09] SPM + sparse coding: replacing k-means in the SPM 12

Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase & descriptive visual phrase Zheng and Gao [TOMCCAP08] Visual phrase: pair of spatially adjacent local image patches Zhang et al. [ACM MM09] Descriptive visual phrase: selected according to the frequencies of its constituent visual word pairs 13

Introduction Related works Our approach Experiments Conclusion and perspectives Visual phrase pattern & visual sysnet Yuan et al. [CVPR07] Visual phrase pattern: spatially co-occurring group of visual words Zheng et al. [CVPR08] Visual synset: relevance-consistent group of visual words or phrases in the spirit of the text synset 14

Introduction Related works Our approach Experiments Conclusion and perspectives Comparison of the different enhancements of the BOW 15

Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Conclusion and perspectives 16

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW) Set of images E-BOW MSSA model SSIVG SURF & Edge Context extraction Features fusion Hierarchal features quantization E-BOW representation 17

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction Interest points detection Edge points detection Colorfiltering using vector median filter (VMF ) SURF feature vector extraction at each interest point Colorfeature extraction at each interest and edge point Fusion of the SURF and edgecontextfeaturevectors Color and position vector clustering using Gaussian mixture model Edge Context feature vector extraction at each interest point Collection of all vectors for the whole image set ∑3 µ3Pi3 ∑2 µ2Pi2 ∑1 µ1Pi1 HAC and Divisive Hierarchical K-Means clustering VW vocabulary 18

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (SURF) SURF is a low-level feature descriptor Describes how the pixel intensities are distributed within a scale dependent neighborhood of each interest point. Good at Handling serious blurring Handling image rotation Poor at Handling illumination change Efficient 19

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge Context descriptor) Edge context descriptor is represented at each interest point as a histogram : 6 bins for the magnitude of the drawn vectors to the edge points 4 bins for the orientation angle 20

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Feature extraction (Edge context descriptor) This descriptor is invariant to : Translation : The distribution of the edge points is measured with respect to fixed points Scale: The radial distance is normalized by a mean distance between the whole set of points within the same Gaussian Rotation: All angles are measured relative to the tangent angle of each interest point 21

Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced Bag of Visual Words (E-BOW)Hierarchalfeature quantization ,[object Object], Hierarchical Agglomerative Clustering (HAC) Divisive Hierarchical K-Means Clustering Stop clustering at desired level k k clusters from HAC … The tree is determined level by level, down to some maximum number of levels L, and each division into k parts. Merged feature in the feature space A cluster at k =4 22

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction Number of latent topics Estimation Features fusion Parameters estimation Hierarchal features quantization Generative process E-BOW representation 23

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Generative Process Different Visualaspects A topic model that considers this hierarchal structure is needed Higher-level aspect: People 24

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Generative Process φ Θ Ψ In the MSSA, there are two different latent (hidden) topics: ,[object Object]

Visual latent topic that represents the visual aspectsV W v h im M N 25

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation Probability distribution function : ,[object Object],Gaussier et al. [ ACM SIGIR05]: maximizing the likelihood can be seen as a Nonnegative Matrix Factorization (NMF) problem under the generalized KL divergence Objective function: 26

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) model Parameter Estimation ,[object Object]

This leads to the following multiplicative update rules : 27

Introduction Related works Our approach Experiments Conclusion and perspectives Multilayer Semantically Significant Analysis (MSSA) modelNumber of Latent Topics Estimation ,[object Object]

Number of the high latent topics (L)

Number of the visual latent topics (K)

is the number of free parameters:28

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation Set of images E-BOW MSSA model SSIVG VWs semantic inference estimation SURF & Edge Context extraction SSVP representation SSIVG representation Number of latent topics Estimation Features fusion SSVPs generation SSIVP representation Parameters estimation Hierarchal features quantization SSVW representation Divisive theoretic clustering SSIVW representation Generative process E-BOW representation SSVWs selection 29

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Word (SSVW) Set of relevant visual topics Set of VWs Estimating using MSSA Set of SSVWs Estimating using MSSA 30

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically significant Visual Phrase (SSVP) SSVP: Higher-level and more discriminative representation SSVWs + their inter-relationships SSVPs are formed from SSVW sets that satisfy all the following conditions: ,[object Object]

Involved in strong association rules

Have the same semantic meaning

High probability related to at least one common visual latent topic31

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationSemantically Significant Visual Phrase (SSVP) SSIVP126 SSIVP126 SSIVP326 SSIVP326 SSIVP304 SSIVP304 32

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representationInvariance Problem Studying the co-occurrence and spatial scatter information make the image representation more discriminative The invariance power of SSVWs and SSVPs is still low Text documents Synonymous words can be clustered into one synonymy set to improve the document categorization performance 33

Introduction Related works Our approach Experiments Conclusion and perspectives Semantically Significant Invariant Visual Glossary (SSIVG) representation SSIVG : higher-level visual representation composed from two different layers of representation Semantically Significant Invariant Visual Word (SSIVW) Re-indexed SSVWs after a distributional clustering Semantically Significant Invariant Visual Phrases (SSIVP) Re-indexed SSVPs after a distributional clustering Set of relevant visual topics Set of SSVWs and SSVPs Estimating using MSSA Estimating using MSSA Divisive theoretic clustering Set of SSIVGs Set of SSIVPs Set of SSIVWs 34

Introduction Related works Our approach Experiments Conclusion and perspectives Experiments Introduction Related works Our approach Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives 35

Introduction Related works Our approach Experiments Conclusion and perspectives Assessment of the SSIVG representation performance in image retrieval Evaluation criteria : ,[object Object]

A Higher-Level Visual Representation For Semantic Learning In ImageDatabases

Recommended

Recommended

More Related Content

Similar to A Higher-Level Visual Representation For Semantic Learning In ImageDatabases

Similar to A Higher-Level Visual Representation For Semantic Learning In ImageDatabases (20)