The project aims at development of efficient segmentation method for the CBIR system. Mean-shift segmentation generates a list of potential objects which are meaningful and then these objects are clustered according to a predefined similarity measure. The method was tested on benchmark data and F-Score of .30 was achieved.
1. Content based image retrieval in large image databases
Lukasz Miroslaw, Ph.D. , Wojciech Tarnawski, Ph.D.
Institute of Informatics
Wroclaw University of Technology, Poland
4. How complex is the universe?
Fig. Credit: Wikipedia. Hubble Ultra Deep Field image of a region
of the observable universe (equivalent sky area size shown in
bottom left corner), near the constellation Fornax. Each spot is a
galaxy, consisting of billions of stars. The light from the smallest,
most redshifted galaxies is thought to have originated roughly 13
billion years
5. Why Do We Need Metaheuristics?
Definition: Metaheuristic designates a computational
method that optimizes a problem by iteratively trying
to improve a candidate solution with regard to a
given measure of quality (Wikipedia).
•Stochastic methods: simulated annealing,
evolutionary algorithms, ant colony optimization.
•Deterministic methods: gradient methods, tabu
search, dynamic programming, etc.
7. Is Selection of the best the best ?
Prof. Roman Galar, Dr Artur Chorazyczewski, Prof. Iwona Karcz-Duleba
8. Content Based Image Retrieval
Project funded by Polish-Singapur Programme.
Duration: 3 years until March 2011.
Objective: Content-based Image Retrieval in large
databases.
Problem: What does it mean “similar”? How to
define a similarity measures between two images?
9. Brain’s visual circuits do error
correction on the fly
Prediction Coding.
Hierarchical Layers detect objects in the bottom-up
manner.
Final Objects
Prediction error More Complex Shapes / Textures / Detailed Geometry
Horizontal / Vertical Lines / Basic Geometrical Shapes
Credit: Physorg.org
Egner T.,Monti J.M., Summerfield C., Expectation and Surprise Determine Neural Population Responses in the
Ventral Visual Stream, J Neuroscience 2010, 30(49):16601-16608.
10. Multi-Scale Approach :
Anisotropic diffusion
Here, a number of images are generated in the
process of convolving the original image with the
Gaussian kernel with the t-variance (scale):
The set of derivative images I(x,y,t) is
equivalent to concurrent solutions of the heat
transport problem or diffusion on the plane.
Since, the convolution operation smoothes
region boundaries we decided to use edge-
preserving isotropic diffusion:
11. Mean-Shift Segmentation
The ”mean-shift”-based image segmentation algorithm(3) is a non-parametric
clustering in 5D image space (3D color space + 2D planar space). The
method does not require to know the number and the shape of clusters.
(3) D. Comaniciu and P. Meer, “Mean shift: A robust approach toward
feature space analysis” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24,
no. 5, pp. 603–619, 2002.
12. The Framework
The aim of the proposed segmentation is to partition the image into non-overlapping most –
meaningful image-regions to receive a very generalized (coarse) view of natural images.
1) Input image
2) Isotropic diffusion step
3) Mean-shift based
segmentation
4) Accumulating the
clustering results over all
scales
5) Mode based clustering of
values collected in
accumulator space
6) „Visualization” of the clusters
detected in previous step -
mapping input pixels into set
of cluster-labels
7) Region merging taking
into account region size
and planar co-occurency
8) Output image
13. Gestalt Theory in Segmentation
„Visualization” of clusters detected in the The most-meaningful (merged)
accumulator space : Grouping laws regions: Collaboration laws
14. Image Retrieval System
Image database
GEN-SEG image
segmentation
Image-to-image
Calculation of MPEG-7-based
features for segmented regions similarity
calculation
Calculation of MPEG-7-based
features for segmented regions
GEN-SEG image
segmentation
List of the most similar
Query Image images
15. Descriptors Used
The Moving Picture Experts Group (MPEG) defined few visual descriptors and
the appropriate distance measures (MPEG-7 standard):
Scalable Color Descriptor – color histogram in HSV color space that is
encoded by a Haar transform.
Color Layout Descriptor – represents the spatial distribution of the color of
visual signals.
Edge Histogram – represents the spatial distribution of five types of edges,
namely four directional edges and one non-directional
Texture Browsing – 5D descriptor related to perceptual characterization of
texture in terms of regularity , coarseness and directionality
Region Shape – describes the shape of an object in image. The descriptor is
robust to noise that may be introduced in the process of segmentation
16. Image Retrieval Concept
We have defined the measure of the similarity between images by composition
of distance metrics defined in MPEG-7 standard. The procedure consists of the
following steps:
1.Calculate region-to-region distances between the query image and images in
the database for chosen set of MPEG-7 descriptors.
2.Sort the list of distances for all regions from query-image.
Democracy: each region belongs to a certain image in the database and
votes for it.
3.Accumulate the votes and pick the best candidates.
Preliminary results: the method outperforms grid-based
segmentation IR (min. F-score = 0.34 on MGV Database)
17. Image-based Genome-scale RNAi
Profiling
Objective: Identification of genes in cell division
process by esiRNA gene silencing method.
•Initial Object-identification based on template
matching.
•Post-processing of candidate solutions: Evolutionary
Algorithm (soft-selection, gaussian mutation).
18. Genome-wide High-Content Screening
Image-based analysis of > 1300 genes’ phenotypes
Identification of > 300 novel genes’ functions.
Fig. GUI Interface of esiImage.
Fig. Detection of mitotic cells in Phase-Contrast Microscopy. HeLa Cell Line.
esiImage: Java-based toolkit for automatic detection of
phenotypes. Author: Lukasz Miroslaw (2006)
Credit: Dr Artur Chorazyczewski
19. Genome-wide High-Content Screening
Fig. Mitotic Index for CDC16 and control.
Kittler R, Pelletier L, Heninger AK, Slabicki M, Theis M, Miroslaw L, Poser I, Lawo S, Grabner
H, Kozak K, Wagner J, Surendranath V, Richter C, Bowen W, Jackson AL, Habermann B, Hyman
AA, Buchholz F. Genome-scale RNAi profiling of cell division in human tissue culture cells.
Nature Cell Biol. 2007 Dec;9(12):1401-12.
20. Rapid Cervical Cancer Diagnosis
Fig. Image partition.
80 Statistical Geometrical Features.
Learning and Testing Phase.
Fig. Phase-Contrast Image with endothelial cells of interest.
Sequential Forward-Backward
Dresden Technical University, Prof. Fuchs Image Processing Group
Selection.
kNN, Linear Fisher Discriminant.
T. Schilling, L. Mirosław, G. Głąb, M. Smereka.
Towards rapid cervical cancer diagnosis: automated Post-processing.
detection and classification of pathological cells in
phase contrast images 2007. Int. J. Gynec Cancer
17(1):118-26.
42. Computer Vision meets AI
Wroclaw University of Technology
Institute of Informatics
Wyb. Wyspianskiego 27
Wroclaw, Poland
info@ai.pwr.wroc.pl
www.ai.pwr.wroc.pl
44. PATSI: Photo Annotation through Similar
Images with Annotation Length Optimization
Automatic Image Annotation describes previously unseen image
with a set of keywords from the semantic dictionary.
Hypothesis: Similar images should share similar annotations.
Model:
Image is described by a set of visual features.
Each visual feature is composed of low-level attributes
(shape, color, edges, texture).
[1] M. Stanek, G. Paradowski, H. Kwasnicka, PATSI — Photo Annotation through Finding Similar Images with Multivariate Gaussian Models,
Lecture Notes in Computer Science, 2010, Vol. 6375, pp. 284-291.
[2] B. Broda, M. Stanek, G. Paradowski, H. Kwasnicka, ImageCLEF 2008 (the method was ranked 5 among 18 participants in one of the
45. Model
Image model: Multivariate Gaussian Distribution
Jensen-Shannon Divergence between image A and B is de fined as:
Kullback - Leibler divergence:
46. Content Based Image Retrieval
1. Similaris : a web-based system for gathering similarity measures for a
given image database.
2. Visible : a web-based system for evaluation of the CBIR system.
Author: Bartek Dzienkowski, M.Sc.
47. Results
Grzegorz Paradowski, Marlena Ochocinska
48. Model-free Detection of Near-
Duplicate Fragments
Problem definition:
Given two random images (i.e. without any a’priori knowledge
about their content) identify pairs of visually similar, near-
duplicate image fragments that are related by afficine
transformations. The fragments are formed by sets of matched
keypoint pairs satisfying the same affine transformation.
Features analysed: Hessian-Laplace, Harris-Affine, MSER, SURFT and SIFT descriptors.
Calculate and decompose the affine transformation for each pair of matched triangles.
Build the histogram of parameters for the decomposed affine transformations.
Detect and post-process the high density areas (peaks) in the histogram to build the
near-duplicate fragments in both images.
[1] Mariusz Paradowski, Andrzej Sluzek, Matching Planar Objects of Images using Histograms of Decomposed Affine Transforms. Submitted to Pattern
Recognition.
[2] Mariusz Paradowski, Andrzej Sluzek, Detection of Image Fragments Related by Affine Transforms: Matching Triangles and Ellipses, ICISA 2010, Korea.
49. Results
Road sign Beer cans Box and a bottle
Different side of a tower Landscape
56. Fuzzy Image Retrieval
Automatic Image Annotation describes previously unseen image
with a set of keywords from the semantic dictionary.
Hypothesis: Similar images should share similar annotations.
Model:
Image is described by a set of visual features.
Each visual feature is composed of low-level attributes
(shape, color, edges, texture).
[1] PATSI — Photo Annotation through Finding Similar Images with Multivariate Gaussian Models, Lecture Notes in Computer Science,
2010, Vol. 6375, pp. 284-291.
57. Institute of Informatics
One of the leading Polish IT institutes.
Main activity : artificial intelligence, machine
learning, computer vision, data mining.
Budget: 4.5M EUR (national grants, EU funds)
Collaboration
LMC at ETH Zurich (Image Processing)
Industry: Microsoft, IBM, Volvo, Google.
Research: TU Munchen, TU Dresden, xxx?
58. Team
Prof. Halina Kwaśnicka Michalak Krzysztof, Ph.D.
Assoc Prof. at WUT, Poland Expertise in Machine Learning.
Deputy Director of Institute of Informatics
Head of the AI Division. Project Leader. Paweł Myszkowski
Assistant Prof. at WUT, Poland
Prof. Urszula Markowska-Kaczmar Expertise in Data Mining and Evolutionary
Assoc Prof. at WUT. Algorithms.
Expertise in Computational Intelligence, Neural
Networks.
Martin Tabakow
Elzbieta Hudyma Assistant Prof. at WUT, Poland
Assistant Prof. at WUT, Poland Expertise in Machine Learning and Analysis of
Expertise in Computer Graphics, Machine biomedical images.
Learning.
Bartłomiej Broda
Mariusz Paradowski Pre-doc at WUT, Poland
PhD, Post-doc at Nanyang Technological Expertise in Machine Learning, CBIR systems
University, Singapur. and image annotations.
Assistant Profesor at WUT, Poland
Expertise in Machine Learning and Computer Wojciech Tarnawski
Vision. PhD, Post-doc in LTNT at ETH Zurich
Assistant Prof. at WUT, Poland
Michał Stanek Expertise in Machine Learning and Computer
Pre-doc at WUT, Poland Vision
Expertise in Machine Learning, CBIR systems and
Computer Vision. Lukasz Miroslaw
Assistant Prof. at WUT, Poland
Expertise in Machine Learning, Computer
Master Students: Grzegorz Terlikowski, Sylwester Vision, Bioinformatics.
Plamowski, Bartek Dzienkowski, Marlena
Ochocinska, Agnieszka Glebala, Lukasz Jercinski
63. Transfer Annotator
Results
Grzegorz Terlikowski, Marlena Ochocinska, Wojciech Tarnawski and Lukasz Miroslaw
64. Model-free Detection of Near-
Duplicate Fragments
Problem definition:
Given two random images (i.e. without any a’priori knowledge
about their content) identify pairs of visually similar, near-
duplicate image fragments that are related by affine
transformations. The fragments are formed by sets of matched
keypoint pairs satisfying the same affine transformation.
Features analysed: Hessian-Laplace, Harris-Affine, MSER, SURFT and SIFT descriptors.
Calculate and decompose the affine transformation for each pair of matched triangles.
Build the histogram of parameters for the decomposed affine transformations.
Detect and post-process the high density areas (peaks) in the histogram to build the
near-duplicate fragments in both images.
[1] Mariusz Paradowski, Andrzej Sluzek, Matching Planar Objects of Images using Histograms of Decomposed Affine Transforms. Submitted to peer-
reviewed journal.
[2] Mariusz Paradowski, Andrzej Sluzek, Detection of Image Fragments Related by Affine Transforms: Matching Triangles and Ellipses, ICISA 2010, Korea.
65. Results
High precision, lower recall.
Real-time analysis is possible.
Method was tested on 3 databases:
in-house data based with 100 out/indoor images containing
objects acquired in different conditions (4950 image pairs)
Faces category of Caltech101 (180K image pairs)
Oxford5K (270K image pairs)
66. Results
Road sign Beer cans Box and a bottle
Different side of a tower Landscape
68. Results
Coverage
face area 10 20 30 40 50
[%]
Precision
[object] 0.48 0.71 0.89 0.95 0.96
Recall
[object] 0.87 0.84 0.81 0.78 0.72
Tab. Face identification. Tested on 188790 image pairs (Calltech101 face category)
69. Results
Without ROI With ROI
Precision [object] 0.88 0.94
Recall [object] 0.56 0.54
Tab. Image Retrieval. Tested on 276970 image pairs (Oxford5K)
Michal Stanek, Oskar Maier, Halina Kwasnicka, "Wroclaw University of Technology Participation at ImageCLEF 2010 Photo Annotation Track", Conf. on Multilingual and Multimodal Information Access
Evaluation, 20-23 Sep 2010, Padva, Italy.
H. Kwaśnicka, M. Paradowski, M. Stanek, M. Spytkowski, A. Śluzek , Intelligent approaches to searching similar images on the basis of visual content, ICI 2010, 10th Int.Conf on Information at Delta Universit
for Science and Technology.
71. Project proposals #1
Let’s look inside PubMed
Consortium:
BIOTEC (Prof. Schroeder): www.gopubmed.com
Wroclaw University of Technology (CBIR)
ETH Zurich (Computer Vision)
Objective: A semantic search engine for life
science images.
72. Project proposals #2
Automated Sign Language Recognition (ASLR)
Consortium:
xxx
Wroclaw University of Technology (Computer Vision, AI, Machine
Learning)
ETH Zurich (Computer Vision, Gesture Analysis)
Objective: search for the publications and
Objective: To develop a translation system to
make it possible to communicate with deaf
people.
Problems: sign language is not international,
three types of gestures: finger spelling, word-
level sign vocabulary and tongue, body and
mouth position.
78. Results – topological matching
Two objects Same text Deformed book
Same scene, some differences Different camera position
17
Editor's Notes
\n
\n
\n
\n
\n
\n
\n
\n
Gestalt theory first arose in 1890 as a reaction to the prevalent psychological theory of the time - atomism. Atomism examined parts of things with the idea that these parts could then be put back together to make wholes. Atomists believed the nature of things to be absolute and not dependent on context. Gestalt theorists, on the other hand, were intrigued by the way our mind perceives wholes out of incomplete elements [1, 2]. "To the Gestaltists, things are affected by where they are and by what surrounds them...so that things are better described as "more than the sum of their parts."" [1, p. 49]. Gestaltists believed that context was very important in perception. An essay by Christian von Ehrenfels discussed this belief using a musical example. Take a 12 note melody. Play it in one key, say the key of C. Now change to another key, say the key of A flat. There might not be any notes the same in the two songs, yet a person listening to it knows that it is the same tune. It is the relationships between the notes that give us the tune, the whole, not which notes make up the tune.\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Screen shots.\n
\n
\n
\n
\n
\n
\n
\n
\n
Near-duplicate fragments usually represents the same objects captured with different external conditions, e.g. position, lighting, camera setup.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Near-duplicate fragments usually represents the same objects captured with different external conditions, e.g. position, lighting, camera setup.\n
Near-duplicate fragments usually represents the same objects captured with different external conditions, e.g. position, lighting, camera setup.\n\nRecall is then computed as the fraction of correct instances among all instances that actually belong to the relevant subset, while precision is the fraction of correct instances among those that the algorithm believes to belong to the relevant subset.\n
\n
\n
\n
"MAGMA : efficient method for image annotation in low dimensional feature space based on Multivariate Gaussian Models",Bartosz Broda, Halina Kwaśnicka, Mariusz Paradowski, Michał Stanek, Proceedings of the International Multiconference on Computer Science and Information Technology [Dokument elektroniczny], Mrągowo, Poland, October 12-14, 2009 / M. Ganzha, M. Paprzycki (eds). Katowice : Polskie Towarzystwo Informatyczne. Oddział Górnośląski, 2009. s. 131-138.\n\n\n "PATSI — Photo Annotation through Finding Similar Images with Multivariate Gaussian Models", Michał Stanek, Bartosz Broda i Halina Kwaśnicka, Proceedings of International Conference on Computer Vision and Graphics 2010. To appear: Lecture Notes in Computer Science\n\n PATSI -- Photo annotation through Finding Similar Images with annotion length optimization" Oskar Maier, Michal Stanek, Halina Kwasnicka, Publikacja zgloszona na konferencje International Joint Conference Intelligent Information Systems\n