(General) To retrieve a clean dataset by deleting outliers.
(Computer Vision) the recovery of a digital image that has been contaminated by additive white Gaussian noise.
DevEX - reference for building teams, processes, and platforms
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
1. Computer Vision Project
Manifold Blurring Mean Shift algorithms for manifold
denoising
Kévin Adda, Florent Renucci
Table of contents
Introduction
........................................................................................................................................
2
I – Description of the algorithm
.....................................................................................................
3
II – Setting the parameters
..............................................................................................................
4
II.1 – σ
............................................................................................................................................................
4
II.2 – L = dimension of the subspace
.......................................................................................................
4
II.3 – Number of iterations and number of neighbours
......................................................................
4
III – Denoising and blurring of a manifold
..................................................................................
5
III.1 – Code
...................................................................................................................................................
5
III.2 - Results
................................................................................................................................................
5
IV – MNIST letters labelling
...........................................................................................................
7
IV.1 – Code
....................................................................................................................................................
7
IV.2 – Results
................................................................................................................................................
8
V – Conclusion
...................................................................................................................................
9
Appendix 1: Code spiral
...............................................................................................................
10
Appendix 2: Code letters
..............................................................................................................
11
1
2.
Introduction
Denoising a dataset is the pre-processing operation aiming at isolating points
which seem to be consistent with the global pattern of the set. There are many ways of
describing the data referred as noise. In computer vision, image denoising consists in
the recovery of a digital image that has been contaminated by additive white Gaussian
noise, whereas video denoising consists in removing noise either by spatial methods
(image denoising on each frame of the video for instance) and/or temporal methods.
Manifold denoising consists in finding numerically distant observation of a
dataset. We usually refer to those observations as outliers. Unlike images, the source
and thus the nature of noise might be unknown. Hence, methods should not make any
assumption on the noise structure.
The objective of this algorithm is to denoise data by blurring it. The blurring
step consists in moving points in the direction of its nearest neighbours: we then
aggregate the points, and if there is an outlier, whose nearest neighbours are then close
to each other but far from it, the outlier will move closer to this existing group. The
move is computed thanks to a projection based on a Principal Component Analysis.
The less we make assumptions about the data, the closer to reality the results are, that
is why we use a nonparametric method.
This algorithm can be used for denoising matter only, or as a pre-processing
step to classification. We will thus present the results of its application on some
manifolds, and then on the classification of MNIST digits. We will use a neural
network and a structured SVM to classify the letters, and then we will pre-process the
data, classify it again, and observe a very interesting decrease on the error rate.
2
3.
I – Description of the algorithm
The steps of the Manifold Blurring Mean Shift (MBMS) algorithm are the following
ones:
-‐
Blurring mean-shift update : we design a Gaussian kernel based on a
particular point, compute it on its nearest neighbours, and we update this point:
Where:
K x! =
x −x
exp − ! !
2σ
!! ∈!! !
!
x −x
exp − ! !
2σ
!
Projection on a sub-dimensional space with PCA: once the data has been
centred, the best linear L-dimensional manifold in terms of reconstruction error
is computed be projecting it orthogonally on the manifold:
𝑥! − (𝑈. 𝑈 ! 𝑥! − µ + µ) ²
min
µ,!
!! ∈!! !
Such that:
𝑈 ! 𝑈 = 𝐼𝑑!
The first step consists in moving the point closer to its neighbours; the second step
consists in moving the point closer to the manifold.
One can remark that only one step could theoretically give good results. The
improvement in using the two steps will be explained further on. Intuitively, using
only the 2nd step is equivalent to projecting the point to the sub-dimensional space
generated considering the PCA projection of x! − mean NN x! , which is
mathematically equivalent to setting σ = ∞. Using only the first step is equivalent to
avoid a loss of information during the projection, that is to say projecting in a subdimensional space rigorously equal to the ambient space, or equivalently reducing the
dimension of the entire to the dimension of the sub dimensional space. These are
particular cases of the general MBMS algorithm. Of course, if we set
3
4.
σ = ∞ and dim sub − dimensional space = dim (entire space) at the same time,
nothing happens (since the two main steps are skipped).
Another interesting point is that the algorithm can take into account the N nearest
neighbours, which means that at every step the entire graph will be considered. In that
case, the algorithm is called MBMSf (f for full). The difference between MBMS and
MBMSf is not very important, which means that from a certain k, taking more
neighbours than k doesn't improve or decrease the result significantly. However, the
choice of the parameters is an important trade-off that we will discuss below, as it
qualifies the strength of the denoising effect: a too strong denoising effect might
damage the dataset.
II – Setting the parameters
II.1 – The kernel variance 𝝈
As explained earlier, if 𝜎 = ∞, the data motion is more important. More generally the
greater 𝜎 is, the stronger the denoising effect. Hence, a good choice for 𝜎 is a value
for which the algorithm will succeed without damaging the dataset by distorting the
manifold.
II.2 – The dimension of the subspace L (intrinsic dimension)
The greater is L, the more the projection respects the general pattern of the manifold.
But at the same time, the motion is less important: at the limit, L = D (the dimension
of the ambient space), and there is no motion at all.
II.3 – Number of iterations and number of neighbours
Experience shows that 5 iterations are enough for good results. The number of
neighbours must be greater like about 10, and then all the results are equivalent.
L increase
Movement
Denoising effect
Risk of distortion
of the manifold
𝜎 increase
Decrease
decrease
decrease
Increase
Increase
Increase
4
k increase (above
10)
No big change
No big change
No big change
More iterations
(above 5)
No big change
No big change
No big change
5.
III – Denoising and blurring of a manifold
As
the
paper
does
it,
our
first
experiment
was
to
apply
the
algorithm
on
datasets
presenting
an
obvious
structural
pattern.
The
paper
chooses
to
use
a
noisy
spiral.
Seeking
for
such
datasets,
we
choose
to
use
a
pinwheel
sets
generator,
which
we
found
on
the
website
of
the
Harvard
Intelligent
Probabilistic
Systems
group.
III.1 – Code
We set the parameters, import a dataset representing a spiral, then :
-‐
-‐
-‐
-‐
compute the k nearest neighbours for each point,
run the step 1 : local clustering,
run the step 2 : Principal Component Analysis on the nearest neighbours,
project the point on the subdimensional space.
See the code spiral in annex, and attached in the .rar.
We use a spiral because this is one of the most challenging geometric 2-dimensional
forms for the Machine Learning algorithms.
We first built our own dataset ("Spiral Built Dataset"), and then chose to use a spiral
dataset found on the Internet, generated by pinwheel.m.
To the noisy pinwheel set, we add uniformly distributed outliers.
III.2 - Results
At each iteration, the most numerically distant points are translated and
projected toward the manifold, and eventually merging with another point. Thus, most
of the outliers are cleaned out from the dataset, but the manifold might be damaged.
Here are the results of the following algorithms on parameters 𝐿, 𝑘, 𝜎 ,
respectively. We chose parameters that are close to the ones chosen in the paper, and
that give good results on our datasets:
•
•
•
•
MBMSk : (1, 15, 1.1);
MBMSf: GBMSk on full graph (1, . , 1.1);
LTP: MBMSk with 𝜎 infinite (1, 15, ∞);
GBMS: MBMSf with zero dimension projection space (0, . , 1.1).
5
6.
As we can see, both MBMSf and GBMS, which are full graph algorithm, damage the
manifold from the first iteration, and eventually reduce it to an only point or group of points.
More generally, one must be careful while choosing the set of parameters, which are
interdependent: for instance, it is possible to work on full graph (i.e.overpass the nearest
neighbours step) but only if the data motion is limited (by choosing a relatively small
variance).
Here is another example of the GBMS algorithm (L=0, on full graph), with a more
appropriate kernel variance.
6
7.
While the algorithm was reducing the dataset to an only point with kernel variance
equal to 1.1, it produces good results with a variance divided by ten. We see how the
algorithm is sensible to each parameter, which are interdependent.
Thus, there is a trade-off between the parameters set or for certain couples of
parameters. We can set them arbitrarily by looking at the results, or define a way of selecting
them. If we use machine learning on blurred data, we can use the error rate as an indicator of
well chosen parameters.
IV – MNIST letters labelling
IV.1 – Code
First we import the MNIST letters dataset. A function can print the data, which is a
matrix of matrices 16*8 filled by 0 and 1, (1 for white and 0 for black) representing a
letter.
Then we pre-process the data with the function ImageLabellingFormatting :
7
8.
-‐
-‐
-‐
-‐
From each matrix representing a letter, we extract the "1" elements. It means
that if 𝑚!,! = 1 for example, we extract the point 1,3 . We do it for all the
matrices, we obtain a vector for each matrix, containing the coordinates of the
white points.
We can then apply the previous denoising algorithm.
If the result is not an integer, for example if we plan to move a pixel to the
coordinates (12,54; 14,1), we round it to (13; 14).
The vector obtained is transformed in a matrix of 0 and 1: that is the exact
opposite of the previous task.
IV.2 – Results
We can compare the initial image and the blurred one. The author does not explain
how he blurs an image represented the way it is here, but only the way he blurs an
image that has the same representation as the spiral of the III rd part. This is why we
decided to use the same approach, not using shades of grey.
It is important to take an even number of neighbors, else a line would not remain a line
after blurring : suppose 5 pixels are aligned horizontally, we move the 3rd one, taking 3
neighbors, so we will base ourselves on 2 neighbors on the one side (for example on
the left) and 1 neighbor on the other side (for example on the right). In that case, the
pixel will be merged with another one, and the line will have holes.
Using a neural network with one lay, we label the images. The error rate is 51%. This
means that this algorithm is not really efficient on this problem.
Then we blur the images and do the study again. The good labelling rate is 53% : preprocessing the data allow to do better labelling.
This result also appears when we consider a random subset of the dataset. We decrease
the error rate by 2-4%.
After that, we separate the learning dataset from the test dataset. The decrease in error
rate is between 3-4% : from 35% to 39%, which means that the algorithm allows to
make better labelling by more than 10%, using neural network.
See the code letters in annex, and attached in the .rar.
8
9.
Good labelling
rates
No blurring
blurring
dataset
51%
53%
Training/test
dataset
35%
39%
V – Conclusion
The Manifold Blurring Mean Shift algorithm allows to blur an image in order
to:
-‐
-‐
Erase some outliers in merging them in the "real" image;
Merge outliers and decreasing their number.
The complexity of the algorithm, which is polynomial in the number of points
make it difficult to use on bigger sets like noisy images. However, it is useful to
smaller data and small images as the MNIST dataset. The advantage of this algorithm
is that it makes no hypothesis on the distribution of the noise and outliers, but
computes the associated motion using relative position to the dataset.
Finally, the algorithm allowed us to decrease the error rate of a multi-label
classification method: we can assume that this pre-processing method allows
improving classification performance on noisy datasets.
The results of classification are still quite poor: the neural network with one
layer is not performing as wanted on such a classification. We actually got way better
performance with a multilabel structured SVM on the initial dataset, but could not
apply it on pre-processed data for technical issues.
9