2. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
135
second level uses the statistical feature and the third level uses the combination of color and statistical feature for image
retrieval. Each level uses a distance measure to calculate the similarity between the images. This paper analyses the
performance of these three levels and results are shown.
2. RELATED WORKS
Content based image retrieval algorithms compare the actual content of the images rather than text. Once the
specified feature has been extracted from the image, there are also a number of options for carrying out the actual
comparison between images. Generally similarity between two images is based on a computation involving the
Euclidean distance or histogram intersection between the respective extracted features of two images. The three most
common characteristics upon which images are compared in content based image retrieval algorithms are color, shape
and texture [7]. Utilizing shape information for automated image comparisons requires algorithms that perform some
form of edge detection or image segmentation. The color feature is one of the most widely used visual features in image
retrieval. In image retrieval, the color histogram is the most commonly used color feature representation. Statistically, it
represents the intensities of the three color channels. Swain and Ballard proposed histogram intersection, an L1 metric, as
the similarity measure for the color histogram [8]. In 2009 Ji-quan-ma presents an approach based on HSV color space
and texture characteristics of the image retrieval [5].In March 2011, Neetu Sharma., Paresh Rawat. S and Jaikaran
Singh.S compare various global descriptor attributes and found that images retrieved by using the global color histogram
may not be semantically related even though they share similar color distribution in some results [4].
3. THEORY RELATED TO WORK
3.1 CBIR
A CBIR system incorporates a query image and an image database. The purpose of CBIR system is to retrieve
the images from the database which are similar to the query image. CBIR is performed in two steps: indexing and
searching. In indexing step contents (features) of the image are extracted and are stored in the form of a feature vector in
the feature database. In the searching step, user query image feature vector is constructed and compared with all feature
vectors in the database for similarity to retrieve the most similar images to the query image from the database.
3.2 Color Representations
Color is one of the most widely used visual features in multimedia context and image / video retrieval. Color is
a subjective human sensation of visible light depending on an intensity and a set of wavelengths associated with the
electromagnetic spectrum.
• RGB color space
Color is a subjective visual characteristic describing how perceived electromagnetic radiation F(l) is distributed
in the range of wavelengths l of visible light [380 nm ... 780 nm]. A color space is a multidimensional space of color
components. Human color perception combines the three primary colors: red (R) Blue (B) and Green (G).
3. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
136
The RGB color space is not perceptually uniform, and equal distances in different areas do not reflect equal
perceptual dissimilarity of colors. Because of the lack of a single perceptually uniform color space, a large number of
spaces derived from the RGB space have been used in practice for a query-by-color.
3.3 Statistical features
The texture of an image can be analyzed using statistical approach. We can use statistical parameters to
characterize the content of an image. Statistical methods can be further classified into first-order (one pixel), second-
order (two pixels) and higher-order (three or more pixels) statistics. The basic difference is that first-order statistics
estimate properties (e.g. average and variance) of individual pixel values, ignoring the spatial interaction between image
pixels, whereas second- and higher order statistics estimate properties of two or more pixel values occurring at specific
locations relative to each other. Histogram based approach is based on the intensity value concentrations on all or part of
an image represented as a histogram. Common features include moments such as mean, variance, dispersion, mean
square value or average energy, entropy, skewness and kurtosis.
3.4 Global Histogram Based Approach
This approach is used to calculate the RGB global histograms for all the images, reduce the dimensions of the
image descriptor vectors using Principal Component Analysis and calculate the similarity measures between the images.
The clustering results are then analyzed to see if the results have any semantic meaning.
3.4.1 Features based on Histogram
• Mean
The mean of a data set is simply the arithmetic average of the values in the set, obtained by summing the values
and dividing by the number of values.
The mean is a measure of the center of the distribution. The mean is a weighted average of the class marks, with
the relative frequencies as the weight factors.
• Variance and Standard Deviation
The variance of a data set is the arithmetic average of the squared differences between the values and the mean.
The standard deviation is the square root of the variance:
The variance and the standard deviation are both measures of the spread of the distribution about the mean.
• Skew
Skew is a measure of the extent to which a data distribution is distorted from a symmetrical normal distribution. The
distortion is in one direction, either toward higher values or lower values. The skew measures the asymmetry (unbalance)
about the mean in the gray-level distribution. Skew can be calculated using the formula Where is the
mean and S is the standard deviation.
4. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
137
• Entropy
E = ENTROPY(I)
returns E, a scalar value representing the entropy of grayscale image I. Entropy is a statistical measure of randomness
that can be used to characterize the texture of the input image. Entropy is defined as -sum (p.*log2 (p))
where p contains the histogram counts.
• Energy
The energy measure tells us something about how gray levels are distributed. The energy measure has a value of
1 for an image with a constant value. This value gets smaller as the pixel values are distributed across more gray level
values. A high energy means the number of gray levels in the image is few. Therefore it is easier to compress the image
data. Energy E can be calculated as ( )E
MN
X i j
j
n
i
m
=
==
∑∑
1
11
,
where M and N are the dimensions of the image, and X is the intensity of the pixel located at row i and column j in the
image map.
3.5 Co-occurrence Matrix
Co-occurrence Matrix represents the distance and angular spatial relationship over an image sub relationship
over an image sub region of specific size [14]. The GLCM is created from a gray scale image. The GLCM is calculates
how often a pixel with gray value i occurs either horizontally, vertically, or diagonally to adjacent pixels with the value
j.GLCM can be used to derive different statistics which provides information about the texture of the image.
4. EXPERIMENTAL METHODOLOGY
The data set contains 250 JPEG images, used to evaluate the effectiveness and efficiency of the selected color
features. Before starting the processing of images, the query image and the data set images are re sized to the same level.
Images are represented in RGB color space and the features are extracted using histogram. The GCH (Global Color
histogram) represents images with single histogram. Then the relevant images are identified based on different color
features and their combinations. In all the three cases the most similar 5 images are displayed.
In the first level images are divided into fixed blocks of size 16 x 16. For each block, its color histogram is
obtained. The GCH of query image and the images in the data set is computed and distance is measured. Relevant images
are retrieved by computing the similarity between the query vector and image vectors.
There are several ways tocalculate distances between two vectors. Here the sum of the distance between the
RGB values of the pixels in the same position is calculated.
In the next level visual feature representation is extracted by incorporating the color histograms and color
moments. The texture features of the query image and images in the data set is calculated using above mentioned
equations and stored. The relevant images are ranked by using a fixed threshold value of the difference of the texture
feature of query image and the database images.
4.1 Histogram
The system used global color histograms in extracting the color features of images. The RGB values of the
image is grabbed and stored in a histogram vector of size 64. A number of distance measures are available to find the
difference between two vectors.
4.2 Euclidean Distance
Euclidean distance between two color histograms h and g can be calculated as [13]
The Euclidean distance is calculated between the query image and every image in the database. All the images
in the data set have been compared with the query image. Upon completion of the Euclidean distance algorithm, we have
an array of Euclidean distances, which is then sorted. The five topmost images are then displayed as a result of the
texture search.
5. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
138
4.3 SAD (Sum of Absolute Differences)
SAD is an algorithm for measuring the similarity between image blocks. It works by taking the absolute
difference between each pixel in the original block and the corresponding pixel in the block being used for comparison.
These differences are summed to create a simple metric of block similarity. Here the size of the block is defined as
16.The sum of the difference between the pixel values of query image and the images in the data set is calculated and
used to find the color similarity of the images.
4.4 Level 1
• The input image is read and the color features of the image is extracted using GCH and stored
• The color features of each of the image in the data set is calculated and stored
• The similarity of the query image and the images in the data set is calculated
• Sort the images based on the distance
• Similar 5 images are displayed.
4.5 Level 2
• The input image is read and converted to gray scale image
• Construct a Gray Level Co-occurrence Matrix of the image
• The texture features mean, standard deviation, variance, entropy, skew and energy are calculated from the GCH
and stored.
• Each image in the data set is taken and its texture features are calculated and stored.
• Calculate and store the Euclidean distance of input image and images in the data set
• Cluster and sort the images by keeping distance as the key
• Retrieve the images with minimum distance.
6. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
139
5. RESULT AND DISCUSSION
5.1 Database
The image data set that used in this work contains 250 JPEG images randomly selected from the World Wide
Web. The following figure depicts a sample of images in the database:
Figure: Image Database
Figure: The query image
5.2 Color Extraction & Matching
The color feature of the query image and the images in the data set is calculated and stored in a vector of size 64
separately. The histograms of the query image and the images in the database are compared by calculating the sum of the
difference of pixel values in the same position in the histogram vectors and obtained the following top 5 results:
7. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
140
Images Color distance
0.0
0.454
0.682
0.791
0.999
Figure: Color results for the searching of query image
The result shows that the most of the images in the result are unrelated to the query.
5.3 Texture Extraction & Matching
The statistical approach is used to analyse an image texture. The statistical features of the images are calculated
using above mentioned equations from the GCH of the query image and the data set images and are compared using the
Euclidean Distance Metric, obtained the following top 5 results:
Images Statistical distance
0.0
0.576
0.594
0.632
0.689
Figure: Texture results for the searching of query image
8. Proceedings of the International Conference on Emerging Trends in Engineering and Management (ICETEM14)
30 – 31, December 2014, Ernakulam, India
141
The system is tested with different query images and it is found that the texture search gives the results better
than the result obtained from the previous search.
6. CONCLUSIONS
An experimental comparison of a number of different color descriptors for content-based image retrieval was
carried out. Color histogram and color moments are considered for retrieval. The application performs a simple color-
based search in an image database for an input query image, using global color histograms. It then compares the color
histograms of different images. The SAD algorithm is used to find the similarity of the images. For enhancing the search,
the application performs statistical feature-based search using global color histogram. The comparison of the images are
done using the Euclidean Distance Equation.
According to the result obtained it is found that the performance depends on the color distribution of images.
Most of the images retrieved using the image search based on color feature are unrelated to the query. The test results
indicate that the search which uses the statistical feature gives the better result compared to the color feature search. The
results can be improved further by making a search considering the combined image properties of color and texture. In
addition to that more enhancement can be done by making a search considering the image properties of color, texture and
shape.
REFERENCES
[1] Gaurav Jaswal, Amit Kaul, Rajan Parmar, Content based Image Retrieval using Color Space Approaches,
International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-2,
Issue-1, October 2012.
[2] Poulami Halda, Joydeep Mukherjee, Content based Image Retrieval using Histogram, Color and Edge,
International Journal of Computer Applications (0975 – 888), Volume 48– No.11, June 2012.
[3] Ja-Hwung Su, Wei-Jyun Huang, Philip S. Yu,Fellow, IEEE, andVincent S. Tseng, Efficient Relevance Feedback
for Content-Based Image Retrieval by Mining User Navigation Patterns, IEEE Transactions On Knowledge
And Data Engineering, Vol. 23, No. 3, March 2011.
[4] Neetu Sharma., Paresh Rawat and jaikaran Singh, Efficient CBIR Using Color Histogram Processing, Signal &
Image Processing : An International Journal(SIPIJ) Vol.2, No.1, March 2011.
[5] Ji-quan ma Heilongjiang University Harbin, China, Content-Based Image Retrieval with HSV Color Space and
Texture Features, International Conference on Web Information Systems and Mining 2009.
[6] Sharmin Siddique, A Wavelet Based Technique for Analysis and Classification of Texture Images, Carleton
University, Ottawa, Canada, Proj. Rep. 70.593, April 2002.
[7] A. Jain and A. Vailaya, Image Retrieval using Color and Shape, Elsevier Science Ltd, vol. 29, pp. 1233- 1244,
1996.
[8] J. Swain and D. H. Ballard, Color Indexing, International Journal of Computer Vision, vol. 7, pp. 11-32, 1991.
[9] Robson Barcellos, Rogério Oliani Saranz, Luciana Tarlá Lorenzi, Adilson Gonzaga Universidade de São Paulo,
Content Based Image Retrieval Using Color Autocorrelograms in HSV Color Space.
[10] G. N.Srinivasan, and Shobha G, Statistical Texture Analysis, Proceedings of World Academy of Science,
Engineering and Technology Volume 36 December 2008 ISSN 2070-3740.
[11] S.Selvarajah and S.R. Kodituwakku , Analysis and Comparison of Texture Features for Content Based Image
Retrieval, International Journal of Latest Trends in Computing, Volume 2, March 2011.
[12] S.R.Kodituwakku,S.Selvarajah ,Comparison of Color Features for Image Retrieval, Indian Journal of Computer
Science and Engineering Vol. 1 No. 3 207-211.
[13] Bongani Malinga, Daniela Raicu, Jacob Furst, Local vs. Global Histogram-Based Color Image Clustering,
University of Depaul, Technical Reports: TR06-010 (2006).
[14] V.Vinitha M.Tech , Dr. J.Jagadeesan Ph.D., R. Augustian Isaac, A.P. (Sr.G), Web Image Search Reranking
Using CBIR, International Journal of Computer Science & Engineering Technology (IJCSET).
[15] Dr. Prashant Chatur, Pushpanjali Chouragade, “Visual Rerank: A Soft Computing Approach for Image Retrieval
from Large Scale Image Database”, International Journal of Computer Engineering & Technology (IJCET),
Volume 3, Issue 3, 2012, pp. 446 - 458, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[16] Dr. Prashantchatur and Pushpanjali Chouragade, “A Soft Computing Approach for Image Searching using
Visual Reranking”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2,
2013, pp. 543 - 555, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.