Genetic Programming based Image Segmentation with Applications to Biomedical Object Detection. Published paper of our research work. Published at Genetic and Evolutionary Computation Conference (GECCO) 2009.
1. Genetic Programming based Image Segmentation with
Applications to Biomedical Object Detection
Tarundeep Singh Dhot, Nawwaf Kharma
Department of Electrical and Computer Engineering
Concordia University, Montreal, QC H3G 1M8
t_dhot@encs.concordia.ca, kharma@ece.concordia.ca
Mohammad Daoud
Department of Electrical and Computer Engineering
University of Western Ontario
London, ON, N6A 3K7
mohammad.dauod@gmail.com
Rabab Ward
Department of Electrical and Computer Engineering
University of British Columbia
Vancouver, BC, V6T 1Z4
rababw@ece.ubc.ca
ABSTRACT 1. INTRODUCTION
Image segmentation is an essential process in many image
Image segmentation is the process of extraction of objects of
analysis applications and is mainly used for automatic object
interest from a given image. It allows certain regions in the image
recognition purposes. In this paper, we define a new genetic
to be identified as an object based on some distinguishing criteria,
programming based image segmentation algorithm (GPIS). It uses
for example, pixel intensity or texture. It is an important part of
a primitive image-operator based approach to produce linear
many image analysis techniques as it is a crucial first step of the
sequences of MATLAB® code for image segmentation. We
imaging process and greatly impacts any subsequent feature
describe the evolutionary architecture of the approach and present
extraction or classification. It plays a critical role in automatic
results obtained after testing the algorithm on a biomedical image
object recognition systems for a wide variety of applications like
database for cell segmentation. We also compare our results with
medical image analysis [8, 9, 14, 15], geosciences and remote
another EC-based image segmentation tool called GENIE Pro. We
sensing [2, 3, 4, 5, 10, 11], and target detection [10, 11, 16].
found the results obtained using GPIS were more accurate as
However, image segmentation is an ill-defined problem. Even
compared to GENIE Pro. In addition, our approach is simpler to
though numerous approaches have been proposed in the past [7,
apply and evolved programs are available to anyone with access
12, 13], there is still no general segmentation framework that can
to MATLAB®.
perform adequately across a diverse set of images [1]. In addition,
most image segmentation techniques exhibit a strong domain or
Categories and Subject Descriptors application-type dependency [7, 12, 17]. Automated segmentation
I.4.6 [Image Processing and Computer Vision]: Segmentation –
algorithms often include a priori information of its subjects [8],
pixel classification.
making use of well-designed segmentation techniques restricted to
a small set of imagery.
General Terms
In this paper, we propose a new, simple image segmentation
Algorithms, Experimentation.
algorithm called Genetic Programming based Image Segmentation
(GPIS) that uses a primitive image-operator based approach for
Keywords segmentation and present results. The algorithm does not require
Image Segmentation, Genetic Programming. any a priori information about objects to be segmented other than
a set of training images. In addition, the algorithm is implemented
on MATLAB® and uses its standard image-function library. This
Permission to make digital or hard copies of all or part of this work for
allows easy access to anyone with MATLAB®.
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that In the following sections, we provide a brief introduction to
copies bear this notice and the full citation on the first page. To copy relevant work in GP based image segmentation and image
otherwise, or republish, to post on servers or to redistribute to lists,
analysis, followed by an overview of our approach in Section 1.3.
requires prior specific permission and/or a fee.
Section 2 describes the methodology of our algorithm and the
GECCO’09, July 8–12, 2009, Montréal Québec, Canada.
Copyright 2009 ACM 978-1-60558-325-9/09/07...$5.00.
2. experimental setup for compiling results. Finally, Section 3 and mutation. In order to compute fitness of a pipeline, the
presents the results of the experiments conducted on a biomedical resultant segmentation produced by a pipeline is compared to a set
image database for cell segmentation purposes. We also compare of training images. These training images are produced by manual
our results with another EC-based image segmentation algorithm labeling of pixels by user as True (feature) or False (non-feature)
called GENIE Pro. pixels using an in-built mark-up tool called ALLADIN. Finally,
when a run of GENIE Pro is concluded, the fittest pipeline in the
1.1 Related Work population is selected and combined using a linear classifier
(Fisher Discriminant) to form evolved solution that can be used to
One of the initial works in this field was published by Tackett
segment new images.
[16] in 1993. He applied GP to develop a processing tree capable
of classifying features extracted from IR images. These evolved
GENIE Pro was developed for analyzing multispectral satellite
features were later used to construct a classifier for target
data. It has also been applied for biomedical feature-extraction
detection. On the same lines, in 1995, Daida et al. [5, 6] used GP
problems [9]. We have used it for comparison purposes.
to derive spatial classifiers for remote sensing purposes. This was
the first time GP was used for image processing applications in
1.3 Overview of Our Work
geosciences and remote sensing.
In this paper, we describe a new genetic programming based
In 1996, Poli [14] proposed an interesting approach to image image segmentation algorithm, GPIS that uses a primitive image-
analysis based on evolving optimal filters. The approach viewed operator based approach for segmentation. Each segmentation
image segmentation, image enhancement and feature detection algorithm can be viewed as a unique combination of image
purely as a filtering problem. In addition, he outlined key criteria analysis operators that are successfully able to extract desired
while building terminal sets, function sets and fitness functions regions from an image. If we are able to describe a sufficient set
for an image analysis application. of these image analysis operators, it is possible to build multiple
segmentation algorithms that segment a wide variety of images. In
In 1999, Howard et al. [10, 11] presented a series of works using
GPIS, we define a pool of low level image analysis operators. The
GP for automatic object detection in real world and military image
GP searches the solution space for the best possible combination
analysis applications. They proposed a staged evolutionary
of these operators that are able to perform the most accurate
approach for evolution of target detectors or discriminators. This
segmentation. From now on, we refer to these image analysis
resulted in achieving practical evolution times.
operators as primitives. Each individual in a population is a
In 1999, another interesting approach was proposed by Brumby et
combination of these primitives and represents an image
al. [4]. They used a hybrid evolutionary approach to evolve image
segmentation program. Therefore, GPIS typically breeds a
extraction algorithms for remote sensing applications. These
population of segmentation programs in order to evolve one
algorithms were evolved using a pool of low level image
accurate image segmentation program.
processing operators. On the same lines, Bhanu et al. [2, 3] used
GP to evolve composite operators for object detection. These
2. METHODOLOGY
operators were synthesized from combinations of primitive image
The proposed algorithm GPIS is designed as a general tool for
processing operations used in object detection. In order to control
learning based segmentation of images. In this paper, particular
the code-bloat problem, they also proposed size limits for the
attention is given to the testing it on biomedical images. Our
composite operators.
approach does not require a particular image format or size and
In 2003, Roberts and Claridge [15] proposed a GP based image
works equally well on both color and grayscale images in any
segmentation technique for segmenting skin lesion images. A key
MATLAB® compatible format.
feature of their work was the ability of the GP to generalize based
For the purpose of learning, a directory with both input images
on a small set of training images.
and matching ground truths (GTs) must be provided. From this
Our approach is motivated by the works of Tackett [16], Brumby
point onwards, we call this a training set. Every input image must
et al. [4] and Bhanu et al. [2, 3]. They all effectively implemented
have a corresponding GT of the same size and format. The GT
a primitive image operator based approach for image analysis.
image is a binary image showing the best assessment of the
This is similar to our approach. In addition, we have used the key
boundaries of the objects of interest; all pixels inside those
criteria outlined by Poli [14] as references while building our
boundaries are by definition object pixels and all pixels outside
algorithm.
the boundaries are by definition, non-object pixels. Pixels on the
boundary itself are by definition also object pixels.
1.2 GENIE Pro
GENIE Pro [4, 9] is a general purpose, interactive and adaptive GPIS has two stages of operation. Stage 1 is a learning phase in
GA-based image segmentation and classification tool. GENIE Pro which GPIS uses the training set to evolve a MATLAB® program
uses a hybrid GA to assemble image-processing algorithms or which meets user-defined threshold of segmentation accuracy
pipelines from a collection of low-level image processing relative to the input images of the training set.
operators (for example edge detectors, textures measures, spectral In the second stage, this evolved individual is evaluated for its
orientations and morphological filters). The role of each evolved ability to segment unseen images of the same type as the training
pipeline is to classify each pixel as feature or non-feature. images. The accuracy results achieved here are from here on
called validation accuracy.
The GA begins with a population of random pipelines, performs
fitness evaluation for each pipeline in the population and selects In a real world situation, due to lack of GTs for unseen images,
the fitter pipelines to produce offspring pipelines using crossover validation accuracy will take the form of the subjective assessment
3. of a human user. However, for this paper, the authors evaluate the chromosome represents a complete MATLAB® segmentation
quality i.e. the validation accuracy of the individual evolved by program. There is a one-to-one mapping between the genome and
GPIS by comparing their segmentation results to their matching the phenome as shown in Figure 2 (c). It also shows the
GT images. We report the results of our evaluation in the Results representation of the knowledge structure used by the genetic
section (Section 3) of this paper. learning system.
2.1 Stage 1: Learning phase of GPIS [Operator Name, Input Plane 1, Input Plane 2, Weights, SE/FP]
GPIS operates in a typical evolutionary cycle in which a
population of potential program solutions (each meant to segment
(a)
images) is subjected to repeated selection and diversification until
at least one of the individual meets the termination criteria. The
[G1] [G2] [G3] [G4] [G5] ......... [Gn]
flowchart of the learning stage is presented in Figure 1.
(b)
START
....
d1 = input;
Initialization
h1 = fspecial(‘disk’,[6 6]);
.... io1 = imfilter(d1, h1);
SE1 = strel(‘square’, 2);
....
Fitness io2 = imerode(io1, SE1);
Evaluation io3 = imclose(io2, SE1);
next generation
....
Io4 = imadd(io2,io3);
out = im2bw(io4, 0.55);
....
Output
Termination Yes
STOP
(Fittest
Criteria met? individual)
GENOME PHENOME
No
Elitism Parent Selection
(c)
Figure 2. (a) Typical layout of a gene (b) Typical layout of
parents
Genetic
(copy)
elite
Diversification a chromosome comprising of n genes (c) One-to-one
offspring
mapping of the genome and phenome
We use a pool of 20 primitive operators. Table 1 provides the
Survivor
complete list of all primitive image analysis operators in the gene
Aggregation Injection
pool along with the typical number of inputs required for each
(Σ)
operator.
Initialization creates a starting population for the GP. The initial
Figure 1. Flowchart of GPIS
population to the GP is randomly generated i.e. chromosomes are
2.1.1 Representation and Initialization formed by a random assigned sequence of operators. The genomic
initialization is also random i.e. parameter values of operators are
In our scheme, the genome of an individual encodes a
also assigned randomly, based on the operator type. For practical
MATLAB® program that processes an image. The input to the
reasons, the size of each chromosome is limited to a maximum
program is an image file and the execution of the MATLAB®
length of 15. In addition, at the time of initialization, the size of
program is an image of the same size and format. This output
the population along with values of crossover rates and mutation
image file is a segmented version of the input image.
rates assigned by the user.
The general layout of a gene is a shown in Figure 1 (a). As seen in
the figure, each gene specifies information about the primitive 2.1.2 Fitness Evaluation
operator it encodes, the input images to the operator and A segmented image consists of positive (object) and negative
parameter settings for the operator. This corresponds to a few (non-object) pixels. Ideally the segmentation of an image would
lines (1-3) of the equivalent MATLAB® program. The gene result in an output image where positive pixels cover object pixels
consists of five parts. The first part contains name of the primitive perfectly and the negative pixels cover non-object pixels perfectly.
operator and the second and third part contain the possible input Based on this idea, we can view segmentation as a pixel-
images to the operator. Based on nature of the primitive operator, classification problem. The task of the segmentation program now
a gene may have one or two input images. The fourth part becomes assignment of the right class to every pixel in the image.
contains weights or parameter values for the primitive operator As such, we can apply measure of classification accuracy to the
and fifth part encodes the nature of the Structuring Element or SE problem of image segmentation. Every segmentation program can
(only in case of morphological operations) or a secondary Filter be expected to identify not only pixels belonging to the objects of
Parameter or FP (only in case of filter operators). interest (True Positives, TPs), but also some non-object pixels
identified as objects (False Negatives, FNs). Further, in addition
The phenomic representation (chromosome) is a linear
to identifying non-object pixels (True Negatives, TNs), some
combination of the genes, as shown in Figure 1 (b). The
pixels belonging to non-objects can be identified as object pixels
4. Table 1. Primitive image analysis operators in the gene pool (1)
where FPR represents False Positive Rate and FNR represents
Operator
Description Inputs Operator Type False Negative Rate. The above formula for accuracy extends
Name
image segmentation problem to a pixel-classification problem.
Therefore, ideally value of accuracy should be 1 (or 100%) for a
ADDP Add Planes 2 Arithmetic
perfectly segmented image. We also see that the formula is mono-
SUBP Subtract Planes 2 Arithmetic modal i.e. if image A is better segmented than image B
Accuracy (A) > Accuracy (B).
MULTP Multiply Planes 2 Arithmetic
However, we further extend this formula by introducing a term
Absolute
DIFF 2 Arithmetic that penalizes longer programs. The fitness function for GPIS is as
Difference
follows:
AVER Averaging Filter 1 Filter
(2)
where FPR represents False Positive Rate, FNR represents False
DISK Disk Filter 1 Filter
Negative Rate, len represents length of the program, β is a scaling
GAUS Gaussian Filter 1 Filter factor for the length of a program, such that β ϵ [0.004, 0.008].
We found this range sufficient for our purpose.
LAPL Laplacian Filter 1 Filter
2.1.3 Termination Criteria
UNSHARP Unsharp Filter 1 Filter
Termination of the GP is purely fitness based and the evolutionary
LP Lowpass Filter 1 Filter cycle continues till the time there is no major change in fitness
over a 10 generations. In order to do this, first we calculate a
HP Highpass Filter 1 Filter
minimum acceptable fitness value based on our trial runs. This
DIL Image Dilate 1 Morphological value was found to be 95% for the database in use. Till the time,
these values of fitness were not achieved, the GP keeps running.
ERODE Image Erode 1 Morphological Once, these values were reached, a mechanism of calculating
cumulative means of the fitness of successive generations was
OPEN Image Open 1 Morphological
implemented. If the absolute difference between the means of 10
CLOSE Image Close 1 Morphological successive generations was less than 5% of the highest fitness
achieved, the GP stops. If however, the GP is used on any other
Image Open-
database, a default value of 90% is set. The termination criteria
OPCL 1 Morphological
Close
can be defined as follows:
Image Close- |current fitness – mean fitness(10 gen)| < 0.05 highest fitness
CLOP 1 Morphological
Open
2.1.4 Parent Selection
Histogram
HISTEQ 1 Enhancement Parent selection is done to select chromosomes that undergo
Equalization
diversification operations. In order to do this, we use a
ADJUST Image Adjust 1 Enhancement tournament selection scheme. It is chosen instead of rank
selection as it is computationally more efficient. The size of the
THRES Thresholding 1 Post-processing
tournament window λ is kept at 10% of the size of the population.
The number of parents selected is 50% of the size of the
population.
(False Positives, FPs).
Therefore, for an ideal segmentation, the number of FPs and FNs 2.1.5 Elitism
should be zero while the number of TPs and TNs should be We use elitism as a means of saving the top 1% chromosomes of a
exactly equal to number of object and non-object pixels. If we population. Copies of the best 1% of the chromosomes in the
normalize the value of TPs and TNs by the total number of object population are copied without change to the next generation.
and non-object pixels respectively, their individual values in the
best case scenario would be 1 and 0 in the worst case scenario. 2.1.6 Diversification
However, for the segmentation problem, achieving this is a We employ five genetic operators in total: one crossover and four
challenging task, thus we define two more measures based on mutation operators. These are selected probabilistically based on
TPs, TNs, FPs and FNs called the False Positive Rate (FPR) and their respective rate of crossover and mutation.
False Negative rate (FNR). FPR is the proportion of non-object
Crossover: We use a 1-point crossover for our GP. Two parents
pixels that were erroneously reported as being object pixels. FNR
are chosen randomly from the parent pool. A random location is
is the proportion of object pixels that were erroneously reported as
chosen in each of the parent chromosomes. The subsequences
non-object pixels. Therefore, for an ideal segmentation, the values
before and after this location in the parents are exchanged creating
of FPR and FNR should be zero. For finding accuracy of a
two offspring chromosomes.
segmentation program, we use a pixel-based accuracy formula
based on FPR and FNR. This formula reflects the training and Mutation: We use four mutation operators for our GP. There are
validation accuracy for GPIS. It is as follows: three inter-genomic mutation operators, namely, swap, insert and
5. delete and one intra-genomic mutation operator, alter, which algorithm. From here on, we refer to the above as training
typically alters the weight element of the selected gene. The gene accuracy and validation accuracy respectively.
to be mutated is randomly chosen from the selected parent
The output of Stage 2 is a chromosome that performs equally well
chromosome.
on both training and validation sets and produces high overall
validation accuracy.
2.1.7 Injection
In order to overcome loss of diversity in a population, we use an
2.3 Experimental Setup
injection mechanism. We inject a fixed percentage of new
randomly initialized programs to the population after every n In order to test the effectiveness and efficacy of our algorithm, we
generation. In the current configuration, we inject 20% new tested the algorithm on a biomedical image database that
consisted of HeLa cell images (in culture) of size 512 pixels 384
programs every 5 generations.
pixels . The task of the algorithm was to segment the cells present
2.1.8 Survivor Aggregation in the images. The procedure for obtaining results using our
The aim of this phase is to collect chromosomes that have algorithm is given in Section 2.3.1.1. We also compare the results
qualified to be part of the next generation (parent, offspring, elite, of our algorithm with those produced by GENIE Pro. The
injected) in order to build the population for the next generation. procedure used for obtaining results using GENIE Pro is given in
Section 2.3.1.2. The final parameter values used for GPIS is given
This phase works in two modes: non- injection and injection
in Table 2.
mode. In the non-injection mode, copies of all parent
Table 2. Parameter settings for GPIS
chromosomes (50%), offspring chromosomes (49%) and elite
chromosomes (1%) form the population of the next generation. In
Population size: µ 200
the injection mode, since a fixed size population (20%) of new
Crossover Rate: Pc 0.45
chromosomes is inserted into the population, the top 79% of
parent-offspring population is selected along with the elite set Swap Mutation Rate: Pms 0.25
(1%) to form the population of the next generation.
Insert Mutation Rate: Pmi 0.25
2.1.9 Output (Fittest Individual) Delete Mutation Rate: Pmd 0.2
Once the termination criterion has been satisfied, the output of the
Alter Mutation Rate: Pma 0.7
GP is typically the ―fittest‖ chromosome present in the final
population. This chromosome is then chosen to be tested on a set Scalability factor for length: β 0.005
of unseen test images and it is explained in Section 2.2. Our aim
is to create a pool of such outputs (segmentation programs) which
2.3.1 Procedure for Training and Validation
allows us to have multiple segmentation algorithms for the same
In order to plan a run of the algorithm, we first decide size of the
database. This is created by subsequent runs of the GP.
training and validation sets. To do so, we define G as the global
Note: When we apply percentages, the results are rounded to the total number of images in a database, T as the training set, V as
closest integers. In case of elitism, if 1% < 1, 1 individual is the validation set, and R as the number of times optimal
copied. individuals are evolved for the same database. The final values for
the above used in the present configuration are: G = 1026, T = 30,
2.2 Stage 2: Evaluation Methodology V = 100 and R = 28.
As mentioned in the previous section, the output of Stage 1 gives
2.3.1.1 Procedure for Obtaining Results using GPIS
us one chromosome, which was the fittest chromosome amongst
Step 1. Randomly select T images and other V images from the
the population of final generation. The accuracy of the
G images in the database.
segmentations produced by this chromosome on the training
images is known as training accuracy of the run. The actual Step 2. Perform training on T images to choose fittest
challenge for this individual is to produce similar segmentation individual for validation.
accuracies on an unseen set of images known as the validation
Step 3. Validate this individual on V images to check the
images.
applicability of this individual on unseen images. If
individual produces high validation accuracy, save it in
In order to do this, we randomly select a fixed number of new
the result set, else discard it.
images from outside the training set along with their
corresponding GTs, from the image database. From this point Step 4. Repeat Steps 1 to 3, R times producing a set of optimal
onwards, we refer to call this the validation set. Once the individuals (result set).
validation set is chosen, the ―fittest chromosome‖ is applied on
Step 5. Calculate values of average training and validation
the entire set of images, one-by-one and segmentation accuracies
accuracy of the result set.
for each image is calculated based on the accuracy formula (1)
given is Section 2.1.2. Once this process ends, the average
2.3.1.2 Procedure for Obtaining Results using GENIE
segmentation accuracy of set or validation accuracy of the run is
Pro
calculated.
Step 1. Select the same T and V images from the G images in
We repeat the above process for various runs and calculate the the database, used for the corresponding GPIS run.
overall training accuracy (average training accuracies of runs) and
validation accuracy (average validation accuracies of runs) for the
6. Step 2. Load each of the T images as a base image and create a Table 3. Segmentation accuracy: GPIS Vs GENIE Pro
training overlay for each image by marking Foreground
Algorithm Training Data Validation Data
(object) and Background (non-object) pixels manually.
GPIS 98.76% 97.01%
Step 3. Train on these manually marked training overlays using
the in-built Ifrit Pixel Classifier.
GENIE Pro 94.12% 93.12%
Step 4. Apply learned solution on V images to produce
corresponding segmented images.
Table 4. Cell count rate: GPIS Vs GENIE Pro
Step 5. Calculate validation accuracy for these V images using
formula (1). GPIS GENIE PRO
Cell
Step 6. Repeat Steps 1 to 5, R times, same as like GPIS. Count Training Validation Training Validation
Measure
Step 7. Calculate values of average training and validation Data Data Data Data
accuracy of the result set.
Detected
98.24% 97.98% 97.02% 96.56%
3. RESULTS Cells
We have based our results on two criteria, effectiveness of the
Type 1
algorithm to accurately segment the given images, and efficiency 100% 100% 100% 100%
Cells
of the algorithm in doing so.
Type 2
Effectiveness is based on two measures, pixel accuracy of the 98.78% 98.22% 97.49% 96.89%
Cells
evolved solution and the cell count rate (percentage of cell
structures correctly identified). In order to calculate the cell count Undetected
1.32% 1.55% 2.12% 2.25%
rate, we have categorized cells into two types: Type1 and 2. Type Cells
1 cells are those which can be identified by eye with relative ease.
Type 2 cells are those which are relatively difficult to be identified
by eye. We also provide comparative results for effectiveness for Table 5. Performance of GPIS based on number of generations
GENIE Pro. This is presented in Section 3.1.1.
Statistical Measure Number Of Generations
Efficiency reflects the time the algorithm takes to produce one
individual of acceptable fitness. This is measured in terms of MEAN 122.07
number of generations. These results are presented in Section
MEDIAN 122
3.1.2. We also briefly discuss one evolved program and also
provide segmented images produced. This is presented in Section STANDARD DEVIATION 6.85
2.4.3 and Figure 5 and 6. UPPER BOUND 138
LOWER BOUND 112
3.1 Effectiveness
Table 3 presents results obtained for training and validation
3.2 Efficiency
accuracies of segmentation achieved for GPIS and GENIE Pro.
These values represent each algorithm’s ability to correctly Table 5 reflects the efficiency of the process to produce the
required results. We measure efficiency based on number of
classify each pixel in an image as an object or non-object pixel.
generations taken by GPIS to produce one individual of minimum
We found that our algorithm performed better in segmenting the
acceptable fitness. This acceptable fitness is 95% training
cells in the images as compared to GENIE Pro.
accuracy. In our runs, we observed that GPIS never failed to
The second measure for effectiveness that we used was cell count produce an acceptable individual.
rate. We extend the concept of TPs, TNs, FPs and FNs to object
The experiments were performed on an Intel Pentium (R) 4 CPU,
detection where a TP denotes an object that is correctly identified
3.06 GHz, 2GB RAM computer. To execute 1 generation, GPIS
by the algorithm as cell, FN denotes an object incorrectly
took at an average 4.21 minutes. The average time taken for a
identified as a cell, FP denotes non-object incorrectly identified as
complete run was approximately 513 minutes. The maximum time
cell, and TN denotes a non-object correctly identified as the
taken for a complete run was 580 minutes.
background. In order to consider an object as belonging to any of
the above four options, a minimum of 70% of object pixels must Since GPIS is designed to run as an offline tool and the time it
correspond to any of the four options mentioned above. Cells takes to execute an evolved program is between 1-3 seconds, the
identified were manually counted. period of evolution of an optimal program is within reasonable
real world constraints. Also, the standard deviation for number of
Similar to the accuracy formula, based on TPs, TNs, FPs and FNs,
generations is low. This shows that GPIS runs consistently to
we can define the FPR and FNR for cell count. FPR is the
produce an optimal program within a tight window.
proportion of non-cell structures that were erroneously reported as
being cell structures. FNR is the proportion of cell structures that
3.3 Evolved Program
were erroneously reported as non-cell structures. The cell count
rate formula used is as follows: Figure 5 shows the chromosomal and genomic structure of an
evolved program. The program evolved is a combination of filters
Cell Count Rate = (1-FPR) (1-FNR) (3)
7. and morphological operators. The first gene is a 6 6 Gaussian [5] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross,
and J. F. Vesecky, ―Algorithm Discovery using the Genetic
low pass filter with a sigma value of 0.8435 followed by a 4 4
Programming Paradigm: Extracting Low-contrast Curvilinear
averaging filter. The output image from gene 2 is eroded with a
Features from SAR Images of Arctic Ice‖, Advances in
flat, disk-shaped structuring element of radius 2. A 6 6 Gaussian
low pass filter with a sigma value of 0.8435 followed by a 4 4 Genetic Programming II, P. J. Angeline, K. E. Kinnear,
(Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442.
averaging filter. The output image from gene 2 is eroded with a
flat, disk-shaped structuring element of radius 2. A 6 6 [6] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
Detection‖, Proceedings of the Conference on Genetic and
averaging filter is again applied to the output image of the eroded
image. Its output image undergoes a composite morphological Evolutionary Computation, July 2002, pp. 1003–1010.
operation of closing and opening with the same structuring
[7] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
element as above. Finally this image is converted to a binary
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
output image using a threshold of 0.09022. The validation
Extraction by a Genetic Algorithm‖, Proceedings of SPIE,
accuracy is calculated for this image.
Vol. 3812, 1999, pp. 24-31.
Figure 6 shows implementation of this evolved program on two [8] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image
validation images along with corresponding results from GENIE segmentation using genetic and hybrid search methods”,
Pro. IEEE Transactions on Aerospace and Electronic Systems,
Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291.
4. CONCLUSIONS
[9] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
In this paper, we propose a simple approach to the complex
Images using Genetic Programming‖, Applied Soft
problem of image segmentation. The proposed algorithm, GPIS,
Computing, Vol. 4, Issue 2, 2004, pp. 175-201.
uses genetic programming to evolve image segmentation
[10] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image
programs from a pool of primitive image analysis operators. The
segmentation using genetic and hybrid search methods”,
evolved solutions are simple MATLAB® based image
IEEE Transactions on Aerospace and Electronic Systems,
segmentation programs. They are easy to read and implement. In
Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291.
addition, the algorithm does not require any a priori information
of objects to be segmented from the images. We have tested our [11] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
algorithm on a biomedical image database. We also compare the Images using Genetic Programming‖, Applied Soft
results to another GA-based image segmentation algorithm, Computing, Vol. 4, Issue 2, 2004, pp. 175-201.
GENIE Pro. We found that our algorithm consistently produced
[12] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
better results. Both the segmentation accuracy and cell count rate
Detection‖, Proceedings of the Conference on Genetic and
were higher than GENIE Pro. It also produced an optimal solution
Evolutionary Computation, July 2002, pp. 1003–1010.
within a reasonable time window. In addition, GPIS never failed
to produce an optimal solution. [13] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
Extraction by a Genetic Algorithm‖, Proceedings of SPIE,
5. ACKNOWLEDGMENTS
Vol. 3812, 1999, pp. 24-31.
We are grateful to Ms Aida Abu-Baker and Ms Janet Laganiere
[14] J. M. Daida, J. D. Hommes, T. F. Bersano-Begey,S. J. Ross,
from CHUM Research Centre, Notre-Dame Hospital, Montreal
and J. F. Vesecky, ―Algorithm Discovery using the Genetic
for providing us with the images for the cell database. We would
Programming Paradigm: Extracting Low-contrast Curvilinear
also like to thank Dr James Lacefield from University of Western
Features from SAR Images of Arctic Ice‖, Advances in
Ontario, London for his help on this project.
Genetic Programming II, P. J. Angeline, K. E. Kinnear,
(Eds.), Chapter 21, The MIT Press, 1996, pp. 417-442.
6. REFERENCES
[1] Bhanu, B.; Sungkee Lee; Das, S., ―Adaptive image [15] J. M. Daida, J. D. Hommes, S. J. Ross, A. D. Marshall, and J.
F. Vesecky, ―Extracting Curvilinear Features from SAR
segmentation using genetic and hybrid search methods”,
Images of Arctic Ice: Algorithm Discovery Using the Genetic
IEEE Transactions on Aerospace and Electronic Systems,
Programming Paradigm,‖ Proceedings of the IEEE
Vol. 31, Issue 4, Oct 1995 Page(s):1268 – 1291.
International Geoscience and Remote Sensing Symposium,
[2] B. Bhanu and Y. Lin, ―Object Detection in Multi-modal
Italy, IEEE Press, 1995, pp. 673–75.
Images using Genetic Programming‖, Applied Soft
[16] K. S. Fu, and J. K. Mui, ―A Survey on Image Segmentation‖,
Computing, Vol. 4, Issue 2, 2004, pp. 175-201.
Pattern Recognition, 13, 1981, pp. 3-16.
[3] B. Bhanu, Y. Lin, ―Learning Composite Operators for Object
P. Ghosh and M. Mitchell, ―Segmentation of Medical Images
Detection‖, Proceedings of the Conference on Genetic and
using a Genetic Algorithm‖, Proceedings of the 8th Annual
Evolutionary Computation, July 2002, pp. 1003–1010.
Conference on Genetic and Evolutionary Computation,
[4] S. P. Brumby, J. P. Theiler, S. J. Perkins, N. R. Harvey, J. J.
2006, pp. 1171—1178.
Szymanski, and J. J. Bloch, ―Investigation of Image Feature
Extraction by a Genetic Algorithm‖, Proceedings of SPIE, [17] Harvery, N. Levenson, R. M., Rimm, D. L. Investigation of
automated feature extraction techniques for applications in
Vol. 3812, 1999, pp. 24-31.
cancer derection from multi-spectral histopathology images.
Proceedings of SPIE, Vol. 5032, 2003, 557-556.
8. [18] D. Howard and S. C. Roberts, ―A Staged Genetic [23] M. E. Roberts and E. Claridge, ―An Artificially Evolved
Programming Strategy for Image Analysis‖, Proceedings of Vision System for Segmenting Skin Lesion Images‖,
the Genetic and Evolutionary Computation Conference, Proceedings of the 6th International Conference on Medical
1999, pp. 1047—1052. Image Computing and Computer-Assisted Intervention, Vol.
2878, 2003, pp. 655- 662.
[19] D. Howard, S. C. Roberts, and R. Brankin, ―Evolution of
Ship Detectors for Satellite SAR Imagery‖, Proceedings of [24] W. Tackett, ―Genetic Programming for Feature Discovery
and Image Discrimination‖, In S. Forrest, editor,
EuroGP'99, Vol. 1598, 1999, pp. 135- 148.
Proceedings of 5th International Conference on Genetic
[20] N. R. Pal, and S. K. Pal, ―A Review on Image Segmentation
Algorithm, 1993, pp. 303–311.
Techniques‖, Pattern Recognition, 26, 1993, pp. 1277-1294.
[25] W. Tackett, ―Genetic Programming for Feature Discovery
[21] D. L. Pham, C. Xu, J. L. Prince, ―Survey of Current Methods
and Image Discrimination‖, In S. Forrest, editor,
in Medical Image Segmentation‖, Annual Review of
Proceedings of 5th International Conference on Genetic
Biomedical Engineering, 2, 2000, pp. 315—337.
Algorithm, 1993, pp. 303–311.
[22] R. Poli, ―Genetic Programming for Feature Detection and
[26] Y. J. Zhang, ―Influence of Segmentation over Feature
Image Segmentation‖, T.C. Forgarty (Ed.), Evolutionary
Measurement‖, Pattern Recognition Letters, 16(2), 1992,
Computation, Springer- Verlag, Berlin, Germany, 1996, pp.
201-206.
110–125.
[GAUSS, d1, 0, 6, 0.8435] [AVER, io1, 0, 4, 0] [EROD, io2,
GAUS AVER EROD AVER CLOP THRES 0, 0, 1] [AVER, io3, 0, 6, 0] [CLOP, io4, 0, 0, 1] [THRESH,
io5, 0, 0.09022, 0]
(a)
Genomic Structure MATLAB® Implementation
d1 = input;
[GAUSS, d1, 0, 6, 0.8435] h1 = fspecial(‘gaussian’, [6 6], 0.8435);
io1 = imfilter(d1, h1);
[AVER, io1, 0, 4, 0] h2 = fspecial(‘average’, [4 4]);
io2 = imfilter(io1,h2);
[EROD, io2, 0, 0, 1] SE1 = strel(‘disk’, 2);
io3 = imerode(io2, SE1);
[AVER, io3, 0, 6, 0] h3 = fspecial(‘average’, [6 6]);
io4 = imfilter(io3,h3);
[CLOP, io4, 0, 0, 1] io5 = imclose(io4, SE1);
[THRESH, io5, 0, 0.09022, 0] output = im2bw(io5, 0.09022);
Segmentation accuracy on validation set: 99.04 %; Number of operators used = 6; Average execution time = 1.252 seconds; Number of
generation needed to converge = 114; Number of fitness evaluation = 10,532
(b)
Figure 5. An evolved program: (a) Chromosomal and genomic structure for the evolved program, (b) Genomic structure and
equivalent MATLAB® implementation of the evolved program with corresponding performance results
(a) (b) (c) (d)
Figure 6. (a) Segmentation produced by GPIS using evolved program shown above on validation image 1 (Validation Accuracy =
99.21%, Cell Count Rate = 100%), (b) Segmentation produced by GENIE Pro on validation image 1 (Validation Accuracy =
95.46%, Cell Count Rate = 97.89%), (c) Segmentation produced by GPIS using evolved program shown above on validation image
2 (Validation Accuracy = 98.93%, Cell Count Rate = 100%), (d) Segmentation produced by GENIE Pro on validation image 2
(Validation Accuracy = 94.22%, Cell Count Rate = 96.45%)