Parameter Optimisation for Automated Feature Point Detection

Parameter Optimisation for Automated Feature
Point Detection
Dario Panada
March 28, 2016
Abstract
We made use of a Random-Forest Regression Voting implementation to
detect feature points on a series of radiography images of patients’ chests.
We explored the performance of the model when configured with different
parameter combinations, to determine whether these had any effect on
the accuracy of predictions.
Overall, our results suggest that increasing the number of random dis-
placements up to a certain amount contributes to enhanced performance,
beyond which however no significant improvement occurs. Size of patch
yielded significant improvements in performance as it was increased, for
all values tested. Finally, increasing the number of decision trees in the
forest did not affect performance.
1 Introduction
Automatic detection of feature points is important for a variety of algorithms
and applications, such as object detection and recognition, motion tracking and
image alignment (eg: panomaric mosaics). In this paper, we review how different
parameter combinations affect the performance of the technique proposed by
Lindner et al. (1) The authors have proposed to make use of a Random-Forest
Regression Voting (RFRG) applied to the Constrained Local Model (CLM)
framework to optimise the prediction of each point’s position. This is further
explained in sections 1.1 and 1.2.
1.1 Random Forest Regression Voting
Specifically, the technique proposed by the authors consists in training a regres-
sor for each feature point and, when presented with an unlabelled sample, using
such regressors to predict the position of each point. As a natural extension,
the authors then proposed to train multiple regressors for each point and then
combining all regressor outputs to optimize the predicted position.
For each point x, a set of features fj(x + dj) are sampled at a set of random
1

displacements dj. The overall area surrounding the feature point from which
features can be sampled is is referred to as a patch. A regressor δ = R(fj(x+dj))
is trained to predict the most likely position of x relative to x + dj.
Once trained, the regressors can be used to predict the position of an unknown
point x by considering a set of candidate positions in a region r. For each point
rn in r considered, appropriate features are extracted and the regressor δ is used
to predict a position for x . This yields a histogram of votes V :
Vn(rn, δ) = Vn(rn, δ) + c (1)
where c the degree of confidence the model has with regards to x coinciding
with rn.
1.2 Constrained Local Model
A Constrained Local Model (CLM) is a series of methods used to predict the
most likely position of a set of points in a target image.
As a first step, the statistical shape model is trained by applying Principal
Component Analysis (PCA) to the aligned training samples, yielding a linear
model of shape variation. Such model can be used to represent the position of
each point l as xl based on a series of parameters including the average position
of l throughout the training set ( ¯xl), the shape model parameters (b) and a set
of mode variations (Pl).
xl = Tθ(¯x + PLb + rl) (2)
After training, the model is matched to each new image I by seeking the pa-
rameter combination that optimizes the overall quality of fit. This is achieved
by, for each point, sampling a region which we believe will contain our target
point. The algorithm then computes the cost of having the point at each pixel
in such region and, by manipulating parameters, looks for the combination of
points’ positions that maximizes the overall quality of fit Q.
Q(p) =
n
l=1
CL(Tθ( ¯xl + Plb + rl))st : bt
S−1
b b ≤ Mtand|rl| < rt (3)
1.3 Combining the RFRG and the CLM
The RFRV is applied in the CLM framework to vote for the best position of
each feature point. From the histogram of votes Vl described earlier, of which
we have one per feature point, the aim is to combine all votes to maximize Q
as defined in equation 3 given that Cl = Vl and given the constraints imposed
by our CLM.
2

1.4 Parameter Optimisation
The performance of the model will be highly dependent on the choice of parame-
ters. Elements such as the number of random displacements used when training
the regressors, the size of the patches to use during training and the number of
regressors trained on each point will all affect the quality of predictions. While
intuitively it seems reasonable to believe that higher numbers of regressors and
displacements will provide more reliable outcomes, we must not forget that this
would increase computational costs. As such, we are looking at investigating
different parameter combinations and suggest which of these would offer an op-
timal trade-off between accuracy of predictions and costs of training the model.
The rest of this paper is structured as follows. In section 2 we describe our ex-
perimental method, including choice of parameters, ranges of values tested and
methods used to validate the results. In section 3 we present our experimental
results and perform statistical tests to decide whether a given parameter affects
the model’s performance significantly. Finally, in section 5 we summarise our
findings and suggest areas of interest for future research based on our work.
2 Method
For this investigation we considered a set of X-Ray chest images available
through the Image Sciences Institute (2). The standard downloadable pack-
age contains, alongside the image files, a set of coordinates for points labelling
various organs such as the left and right lung and the heart. For the purpose of
this investigation, we will only make use of the left lung. In total, the dataset
consisted of 247 samples with the left lung being outlined by a total of 50 points.
3

Figure 1: Example of annotated image
Images were pre-processed and converted from the native raw format into jpg
files. In addition, these were rescaled from an original resolution of 2048x2048
to one of 1024x1024 to fit the provided sets of points. Furthermore, all point
files were converted from the original format in which they were distributed to
the one used by the hclm tools used in the research.
The following parameters have been tested for the following ranges of values.
Parameter Tested Values
Number of Random Displacements 10, 20, 30, 40
Patch Size 5x5, 10x10, 15x15, 20x20, 25x25
Number of Trees per Forest 1, 3, 5, 7
Table 1: List of parameter combinations which have been investigated
As one parameters was changed, the other two were kept fixed. Default values
for each parameter are given below.
We made use of two-fold cross-validation to test our models. As each sample
originally included their number in its name (1 to 247), our first fold contains
all even samples and our second one all odd samples.
Applying cross-validation brought several advantages to our investigation. Firstly,
4

Parameter Default Value
Number of Random Displacements 10
Patch Size 17x17
Number of Trees per Forest 4
Table 2: Parameter default values
it allowed us to obtain an accurate estimate of the accuracy of our regressor.
By testing our trained models on labelled samples, we were able to verify the
extent by which the predictions were accurate.
However, had we trained the model on all available samples, our models would
have most likely overfit the data and hence provided too optimistic results.
(With the additional risk of then performing worse on future samples which
they had not seen during training.) A possibility would have been to exclude
some data from the training set and use it to test the model, which would how-
ever have meant we would not use all available resources. This leads to the
second advantage of cross-validation, which is that by alternatingly training on
a subset of the data and testing on the remainder we were able to effectively
make use of all available data while reducing the risk of overfitting to it.
When tested on labelled data, our search results were expressed as CDF curves,
which give an indication of the mean point-to-point error for all points. Corre-
sponding y coordinates describing the CDF curves obtained from the two models
obtained through cross-validation were averaged to produce a mean CDF curve.
Finally, curves generated for each parameter were compared using the Kolmogorov-
Smirnov (KS) test with p = 0.05 to decide whether differences between them
are statistically significant. The KS test compares the equality of two distribu-
tions, and is therefore appropriate to determine whether two CDF curves are
significantly different. We applied the KS test by comparing the y values of two
given average CDF curves to determine whether the difference is statistically
significant. Before doing this, we ascertained that coordinate points from both
curves shared the same x values. The entire curve was considered when applying
the KS test.
3 Results
3.1 Number of Random Displacements
We present performance results when using different numbers of random dis-
placements.
5

Figure 2: Performance of model for different numbers of random displacements
And a summary of whether results were found to be significantly different.
Curve A Curve B Difference Significant p-value
10 20 Yes < 0.001
20 30 No 0.9868
30 40 No 0.6510
Table 3: Analysis of whether difference between curves is statistically signifi-
cant, values in columns Curve A and Curve B indicate the number of random
displacements for the given curve
3.2 Patch Size
We present performance results when using different patch sizes.
6

Figure 3: Performance of model for different patch sizes
Following, a summary of whether results were found to be significantly different.
5x5 10x10 Yes < 0.001
10x10 15x15 Yes < 0.001
15x15 20x20 Yes 0.0042
20x20 25x25 Yes < 0.001
Table 4: Analysis of whether difference between curves is statistically significant,
values in columns Curve A and Curve B indicate the patch sizes for the given
curve
3.3 Number of Trees
We present performance results for models trained with varying number of trees.
7

Figure 4: Performance of model for different patch sizes
And a summary of whether results were found to be significantly different.
1 7 No 0.936
Table 5: Analysis of whether difference between curves is statistically significant,
values in columns Curve A and Curve B indicate the number of trees for the
given curve
4 Analysis
Our results suggest that enhancing performance via parameter optimisation is
possible, although it is not the case that increasing any parameter by an arbi-
trary amount necessarily leads to an improvement in performance.
We will start by considering our first parameter, that is the number of ran-
dom displacements. What is intuitively suggested by Figure 1, that is that an
appreciable improvement in performance is detectable exclusively between 10
and other values, is confirmed by the results presented in Table 3.
Our results show that the difference between 10 and 20 are statistically sig-
nificant, but no other difference is. As such, we are confident in assuming that
the difference between 10 and 30, and 10 and 40 is not significant either.
This suggests that setting the number of random displacements D at 20 al-
ready allows for the best performance that the model can offer, with additional
8

displacements not yielding any appreciable changes. To try and understand the
reason why this is, we can can think back to the description given in section
1.1.
During the training phase, each regressor δ is repeatedly trained on features
sampled at x + dj to predict the position of x (a known training point) from
the displaced position. This knowledge is then applied when presented with
an unlabelled sample, when the regressor will make use of features at different
positions in the given region to predict the position of l (an unknown point
with the same index as x). As each prediction made while considering a new
sample is essentially a prediction of the position of l made from a displacement
l + dj, it makes sense to assume that the more displaced positions δ has seen
during training the more holistic of an understanding of the surrounding region
it will have developed. Hence, the better it will be able to make an appropriate
prediction on the position of l based on a random displacement from it. This
would explain why increasing the number of random displacements from 10 to
20 results in an appreciable increase in performance.
However, it is also reasonable to assume that after a sufficient number of training
displacements Doptimal, δ will have learned the region surrounding x sufficiently
well to make an accurate prediction of the position of l when presented with an
unlabelled sample. As such, any additional displacements in excess of Doptimal
would not increase the ability of δ to make use of the region surrounding the
point to make a prediction, hence why there is no significant difference between
20, 30 and 40.
It might also be the case that although we increased the number of random dis-
placements, the overall region where these displacements took place remained of
limited size. As such, it might be possible that increasing the size of the region
could offer enhanced performance for higher numbers of random displacements,
although we suspect that eventually a plateau will still be reached. Overall
however our results agree with those cited for this parameter in the study by
Lindner et al. (3). Such study suggested 20 as an optimal parameter for the
number of random displacements. In fact, we did find an improvement between
10 and 20 and no further improvements throughout larger values, suggesting
that the two studies led to similar results.
With regards to performance improving as the patch size is increased, this is not
a surprising result. Intuitively, the larger a learning space the regressor is given
the more features it should be able to learn and the better an understanding it
should obtain about the region surrounding the point. It might also be the case
that some features might not be detectable if the regressor is constrained to
learning from too small a region, which if true would be another reason in sup-
port of increasing patch sizes leading to enhanced performance. To summarise,
for the frame width we used (200) our results are consistent with those found
by Lindner et. al (4) which suggest that larger patch sizes lead to enhanced
9

performance.
It is possible, and further experiments would be needed to ascertain this, that
particularly larger patches might eventually lead to performance not signifi-
cantly better, or even worse than those currently obtained. This could be due
to features too distant from the target point being irrelevant or even misleading.
Finally, we notice that increasing the number of trees does not lead to any
improvement in performance. While this is somewhat surprising, we ought to
remember that decision trees themselves are known to provide good levels of
performance. Therefore, it might simply be the case that for this particular
dataset the performance of a single decision tree was sufficiently good that ex-
panding it into a Random Forest did not offer appreciable improvements. An
alternative explanation could be that other parameters not tested in this study
also significantly contribute to performance. This would lead to the downside
of having a reduced amount of decision trees being outweighed by the optimal
configuration of other settings which have not been explored.
While our results for this parameter manipulation do not reproduce the findings
of Lindner et al. (5), we would like to note that on this occasion we made use of
a one-stage model whereas that mentioned in the previous research makes use
of a two-stage model. Because the previous study suggests that the two-stage
model approach leads to better performance, we are not entirely surprised that
our model did not perform as well on all occasions.
On the other hand, it might be that significant differences would be observ-
able for higher amounts of regressors in the forest.
5 Conclusion
We have tested the performance of the model across different ranges of parame-
ters. We have evidence to suggest that while increasing the values of parameters
such as number of random displacements and patches can lead to enhanced per-
formance, this is not indiscriminately true. In particular, increasing random
displacements beyond a certain value did not offer any improvement in perfor-
mance and, for this dataset, the number of trees did not seem to affect results.
We believe that future investigations should be focused firstly on the patch
size to determine whether there exists a value after which performance stops
improving significantly or, even, if performance degrades due possibly to the
reasons discussed in the previous section.
Also, the effect of a larger amount of decision trees in the forest would be
worth studying, perhaps including additional datasets to determine whether it
was this one in particular ”easy” enough that a single tree could learn it suffi-
10

ciently well.
Finally, it would be worth investigating whether increasing the size of the re-
gion surrounding feature points, used to generate random displacements, allows
for growing numbers of random displacements to oﬀer further enhancements in
performance.
6 Bibliography
(1) Lindner, C.; Bromiley, P.A.; Ionita, M.C.; Cootes, T.F., ”Robust and Accu-
rate Shape Model Matching Using Random Forest Regression-Voting,” in Pat-
tern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.9,
pp.1862-1874, Sept. 1 2015
(2) B. van Ginneken, M.B. Stegmann, M. Loog, ”Segmentation of anatomi-
cal structures in chest radiographs using supervised methods: a comparative
study on a public database”, Medical Image Analysis, 2006, vol. 10, pp. 19-40.
(3) Lindner et al. (2015)
(4) Ibid.
(5) Ibid.
Word Count: 2388
11

Parameter Optimisation for Automated Feature Point Detection

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

En vedette

En vedette (19)

Similaire à Parameter Optimisation for Automated Feature Point Detection

Similaire à Parameter Optimisation for Automated Feature Point Detection (20)

Parameter Optimisation for Automated Feature Point Detection