(General) To retrieve a clean dataset by deleting outliers.
(Computer Vision) the recovery of a digital image that has been contaminated by additive white Gaussian noise.
2. Denoising
(General) To retrieve a clean dataset by deleting outliers.
(Computer Vision) the recovery of a digital image that has
been contaminated by additive white Gaussian noise.
Noisy spiral dataset
Handwritten digits recognition
Noisy image
3. Manifold Blurring Mean Shift algorithm
(MBMS)
Blurring mean-shift update :
, where K is a Gaussian kernel:
Projection on a sub-dimensional that: with PCA:
space
, such
Parameters:
the variance of the Gaussian kernel ;
k the number of neighbors to consider ;
L the local instrinsic dimension;
Iteration number for the whole algorithm.
4. Setting the parameters: the kernel variance
related to the level of local noise outside the manifold;
The larger it is, the stronger the denoising effect;
But can distort the manifold shape over iterations.
Trade-off between kernel variance and iteration number.
5. Setting the parameters: the number of
neighbors
k is the number of nearest neighbors that estimates the local
tangent space;
MBMS is quite robust to it. It typically grows sublinearly with
N.
However, it effects strongly the mean-shift blurring effect as
each point is motioned toward the Gaussian kernel mean on
the neighbors.
Trade-off between the number of parameters and kernel variance.
6. Setting the parameters: the intrinsic
dimensionality
If L is too small, it produces more local clustering and can
distort the manifold;
If L is too big, points will move a little : if L is equal to the
dimension of the set, no motion.
Since we use 2D datasets, we will usually choose L=1, except for GBMS Algorithm (L=0)
7. Setting the parameters: the number of
iterations
A few iterations (1 to 5) achieve most of the denoising
More iterations can refine this and produce a better
result, but shrinkage might arise.
Trade-off between the number of iterations and the other parameters.
8. Spiral dataset
Pinwheel.m: generates little two-dimensional datasets
that are spirals of noisy data.
(credit: Harvard intelligent probabilistic systems)
27. MNIST Dataset Classification
Input : 16x8 matrices of 0 and 1 representing the image of a letter.
Parameters :
k = 4; (must be an even number)
L = 1; sigma = 1;
n_iteration = 1;
Preprocessing algorithm :
Extraction the "1" elements. It means that if m1,3=1 for example, we extract the point 1,3.
coordinates of the white points.
Denoising step.
If the result is not an integer, we round it.
for example if we plan to move a pixel to the coordinates (12,54;14,1), we round it to (13;14).
The vector obtained is transformed in a matrix of 0 and 1.
28. MNIST Dataset Classification
General algorithm :
We learn a neural network that labels the dataset
We compute the good labelling rate
We denoise the images
We learn a new neural network
We compute the good labelling rate
29. MNIST Dataset Classification
Results :
We first run the algorithm on the dataset, and then
separate training set and test set. We compare the good
labelling rates.
Good labelling rates
dataset
Training/test dataset
No blurring
51%
35%
blurring
53%
39%
30. Conclusion
The Manifold Blurring Mean Shift algorithm allows to
blur an image in order to:
Erase some outliers in merging them in the "real" image;
Merge outliers and decreasing their number.
decrease the error rate of a labelling method
More
Also
congruent image for a human eye
more congruent for an automatic classification