SlideShare a Scribd company logo
1 of 17
Download to read offline
Image Feature Extraction for Plankton Classification
An exploration of image feature extraction and classification on
large oceanographic data
June, 3 2015
Author
KEVIN PARK
Oregon State University
Corvallis
KHP
2015
KIDDER HALL PRESS
Contents
1 Introduction 3
2 Edge detection 3
3 Feature Extraction 7
3.1 Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Histogram Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Classification 14
4.1 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Results & Discussion 15
6 Future Work 16
7 References 16
1 Introduction
Plankton, perhaps surprisingly, form a critical link in the global ecosystem and are a fun-
damental source of food and energy for aquatic wildlife. As such, the population levels of
plankton are an ideal metric for determining the health and viability of oceans and aquatic
ecosystems. The challenge thus becomes determining the best way to classify and count the
multitude of phytoplankton and zooplankton species from a sample of ocean water. Modern
imaging systems can easily produce hundreds of thousands of images in a short time scale,
so using human based means is daunting and often of minimal utility.
An open competition called the National Data Science Bowl was hosted by Booz|Allen|
Hamilton and Kaggle. A training set of over 30,337 images of plankton were labeled (121
classes) and a test set of 130,400 images of plankton were provided by Hatfield Marine
Science Center at Oregon State University. The goal of this competition was to find an
algorithm that can properly classify the different species of plankton. In addition, there was
a substantial prize to three teams that built the best algorithm. The competition ended in
March, but it still allows individuals to submit their results from their algorithms to see how
they would have performed if they had participated in the competition.
The goal for my project is to properly construct a contour of the plankton to be able to
extract geometric properties and how it distinguishes the different species of plankton.
2 Edge detection
The images of the different species of the plankton vary in shape and intensity and as a
result creates a challenge in producing an edge that captures the shape of the plankton. We
found that by using a combination of popular edge detection methods, the Canny, Roberts,
and Sobel. It provides us more details than using only one method.
3
Roberts Edge Detection
The Roberts edge detection algorithm was developed in 1963[1] and at the time it was
difficult to implement and not widely used because the lack of computing availability. As
computation power increased as did it’s popularity in edge detection. The idea behind this
method involves only a few steps. The first is computing the gradient of the original image
at each pixel, by convolving with the following kernels.
Gx =



+1 0
0 −1


 and Gy =



+1 0
0 −1


 .
The magnitude of the gradient is computed for each pixel,
I(x, y) = G(x, y) = G2
x + G2
y.
and the gradient direction,
Θ(x, y) = arctan
Gy
Gx
.
The two results are then combined to produce what is called the Roberts Edges. The
disadvantage to Roberts edge detection is the sensitivity to noise.
Sobel Edge Detection
The Sobel edge detection is similar to the Roberts method. The only difference is the kernel
used to produce the gradient,
Gx =






−1 0 1
−2 0 2
−1 0 1






and Gy =






−1 −2 −1
0 0 0
1 2 1






.
Similarly to Roberts, the Sobel is also sensitive to noise.
4
Canny Edge Detection
The most popular algorithm of edge detection is the Canny algorithm, it was developed in
1986[3]. The difference between the Roberts and Sobel, is the additional use of denoising
(smoothing) the image prior to edge formation and after eliminating any isolated edges. It
has been shown under most conditions that the Canny algorithm performed better compared
to Roberts, Sobel, and other methods. Currently, it is the benchmark standard in comparing
new edge detection[2].
The Canny algorithm method is completed in five steps.
1. First we denoise the image with a Gaussian filter with some fixed parameter (σ).
2. Compute the gradient of the image, with
Gx =



−1 +1
−1 +1


 and Gy =



+1 +1
−1 −1


 .
3. Determine the local maxima as edges.
4. Eliminate any false edges such as isolated pixels.
5. Finally, fill gaps between edges by thresholding.
We vary the smoothing parameter (σ) from 1.5 to 3 with increments of 0.5 (∆σ = 0.5) and
after compute the mean of the edges produced.
Combining edge detectors
The edges produced by Roberts and Sobel were combined with the edges from the Canny
algorithm. The combination of these methods provided accurate shapes for both simple and
complex planktons.
5
Figure 1: The above are the original images of the plankton species, Acantharia Protist
(A. Protist), Decapods, Detritus Blob (D. Blob), and Trichodesmium Bowtie (T. Bowtie)
represent with the edges produced by taking the mean from the Canny edge algorithm.
Figure 2: These edges are produced from combining Canny and Roberts.
6
Figure 3: These edges produced from combining Canny and Sobel.
Although difficult to see the edges produced by combination of Canny and Roberts and
Canny and Sobel are different from each other.
3 Feature Extraction
3.1 Shape Analysis
After forming an edge for each image plankton, we extracted several geometric properties.
For each geometric feature there is an accompanying figure of (Figures 3-9) four species (A.
Protist, Decapods, D. Blob, and T. Bowtie) to demonstrate the similarities and differences
of their distributions. The distributions on the left represent the features extracted from
combined edges of Canny and Sobel and the right are combined edges of Canny and Roberts.
3.1.1 Area
The area is computed by counting the number of pixels that make up the object.
7
Figure 4: The distribution of Area between Decapods and D. Blob are different. However,
there is little difference between A. Protist and T. Bowtie.
This method in computing the Area of the object is not the scale or rotation invariant.
3.1.2 Perimeter
The perimeter is computed by calculating the number of pixels that surround the object.
Figure 5: The distribution perimeter does not seem to differ too much from each other
besides D. Blob. This is most likely do the image size.
This method of computing the perimeter is not scale or rotation invariant.
8
3.1.3 Major and Minor Axes
The Major Axis is the longest straight line in the object and the Minor Axis is the longest
straight line perpendicular to the Major Axis.
Figure 6: The distribution for the Major Axis length seems to be different between A. Protist
and T. Bowtie.
Figure 7: The distribution for the Minor Axis length, is different between A. Protist and D.
Blob.
The Major and Minor Axis lengths are rotation invariant, but not scale invariant.
9
3.1.4 Convexity
The convexity is defined as the ratio of the perimeter of the convex hull to the perimeter of
the original image.
Convexity =
Perimeter of Convex Hull
Perimeter of the Image
.
Convex hull is a method of fitting the smallest convex region that surrounds the original
image. If the Convexity ≈ 1, then the object is convex.
Figure 8: Compared to the previous methods, the distribution of convexity between these
four species appear to be more distinct.
3.1.5 Compactness
Compactness gives us information on how circular the object is and is computed from the
Area and Perimeter of the image.
Compactness =
4πArea
Perimeter2
If the compactness is closer to 1, then the object is more circular. For example if the
Area= πr2
and Perimeter= 2πr, then compactness is 1.
10
Figure 9: Similar to the previous values, it appears the A. Protist and T. Bowtie are the
most different.
3.1.6 Eccentricity
The eccentricity is computed by first fitting an ellipsoid around the object and taking the
ratio of the length of the major axis to length of the minor axis,
Eccentricity =
Length of Major Axis
Length of Minor Axis
.
Figure 10: The distribution of Eccentricity for Decapods seem to differ from the other three
planktons.
11
3.1.7 Solidity
Solidity measures if the object is concave or convex. This is done by taking the ratio of the
original object over the area of the fitted convex hull.
Solidity =
Area of the Shape
Area of the Convex Hull
.
The solidity of a convex object is 1.
Figure 11: The distribution of Solidity is different between A. Protist and D. Blob, while
the distribution for Decapods and T. Bowtie are similar.
3.1.8 Mean Curvature
We computed the curvature by segmenting the boundary for each object. On the set points
that make up each segment, a 2nd
degree polynomial was fitted and the coefficient of quadratic
term was used to measure curvature. The mean curvature was computed by taking the mean
of all the coefficients.
12
Figure 12: The distribution of mean curvature between the four planktons appear to be
similar.
3.1.9 Skeleton
A skeleton is fitted inside the object and then the number of branching nodes are then
counted.
Figure 13: The distribution in the number of branch points are the similar between A. Protist
and D. Blob. Furthermore, the Decapods and T. Bowties are similar. Interestingly, these
two groups are different.
13
3.2 Histogram Method
The first feature extraction method we developed, was measuring the distribution of grayscale
values that make up the shape and texture of each plankton. The grayscale scaled between
0 and 1, where 1 is white and 0 is black. A count is made for the grayscale values that fall
between the intervals of [0, 0.1), [0.1, 0.2), . . . , [0.9, 1).
Figure 14: The grayscale distributions on the right correspond to the species selected on
the left. The species of plankton are, Copepod Calanoid, Echinoderm Larva Pluteus Early,
Ctenophore Cydippid no tentacles, and Jellies Tentacles.
We have that certain species of plankton the distribution of grayscale values are differ-
ent from each other. While there are a few species of plankton that have nearly identical
distributions.
4 Classification
4.1 Random Forest
After extracting the features from the Canny & Sobel and Canny & Roberts. A random
forest model was fitted on the training set and with a different number of trees from 500 to
6000. The models performance was then evaluated on the test set.
14
5 Results & Discussion
The score for plankton classification was measured with a multi-class logarithmic loss. For
each image in the test set there is a true class label. The formula used is,
logloss = −
1
N
N
i=1
M
j=1
yij log(pij).
Where N is the number of images in the test set and M is the number of class labels. The
score is ”conveniently” evaluated on Kaggle’s website.
If we were to randomly guess (with equal probability) the class of each image (1 of the
121) we would obtain a benchmark score of 4.795791. Our random forest model with the 21
features performed better than the benchmark score and as we increased the number of trees
the score improved and converged around 1.77. The features extracted using the Canny &
Sobel performed better than the Canny & Roberts edges. We conclude the intensity values
and geometric properties of plankton distinguish the different species of planktons.
15
6 Future Work
The proper fitting of a classification model was not fully explored and should be. There are
plenty of room for improving the current feature extraction and extract additional geometric
features.
From our results we found that the combinations of edge detecting methods produced the
desired results. A possibility is to develop an algorithm that creates the quality edges from
the combined Sobel and Canny.
We can focus on accurately representing some of the features extracted such as the mean
curvatures. The method we used just fitted a second degree polynomial and we could explore
other types of fitting such splines to measure the curvature. We can extract additional
geometric features that are described in A survey of Shape Feature Extraction Techniques
[3].
There was no tuning for the random forest model and the only parameter we varied was
the number of trees. We can further explore how the model does if other parameters are
changed, such as the number of nodes and pruning the trees.
Additionally, we can fit a random forest model by boosting instead of bagging (which we
fitted.)
7 References
1. R. Maini and H. Aggarwal. Study and Comparison of Various Image Edge Detection
Techniques. CSC journals. vol. 3, 1-60, 2009.
2. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Analysis
Machine Intell., vol. 8, 679-697, 1986
3. M. Yang, K. Kpalma, and J. Ronsin. A Survey of Shape Feature Extraction Techniques.
Peng-Yeng Yin. Pattern Recognition. IN-TECH, pp. 43-90,2008.
16
4. C. Cheng, W. Liu, and H. Zhang, Image retrieval based on region shape similarity, in
13th SPIE symposium on Electronic Imaging, Storage, and Retrieval for Image and
Video Databases, 2001.
5. M. Peura and J. Iivarinen, Efficiency of Simple shape descriptors. Proc. 3rd Interna-
tional Workshop on Visual Form (IWVF3), May 1997.
17

More Related Content

What's hot

Region duplication forgery detection in digital images
Region duplication forgery detection  in digital imagesRegion duplication forgery detection  in digital images
Region duplication forgery detection in digital imagesRupesh Ambatwad
 
Image processing
Image processingImage processing
Image processingPooja G N
 
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGES
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGESIMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGES
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGEScseij
 
A Novel Color Image Fusion for Multi Sensor Night Vision Images
A Novel Color Image Fusion for Multi Sensor Night Vision ImagesA Novel Color Image Fusion for Multi Sensor Night Vision Images
A Novel Color Image Fusion for Multi Sensor Night Vision ImagesEditor IJCATR
 
Pan sharpening
Pan sharpeningPan sharpening
Pan sharpeningNadia Aziz
 
Digital image processing short quesstion answers
Digital image processing short quesstion answersDigital image processing short quesstion answers
Digital image processing short quesstion answersAteeq Zada
 
Current issues - Signal & Image Processing: An International Journal (SIPIJ)
Current issues - Signal & Image Processing: An International Journal (SIPIJ)Current issues - Signal & Image Processing: An International Journal (SIPIJ)
Current issues - Signal & Image Processing: An International Journal (SIPIJ)sipij
 
Digital image processing and interpretation
Digital image processing and interpretationDigital image processing and interpretation
Digital image processing and interpretationP.K. Mani
 
Image pre processing
Image pre processingImage pre processing
Image pre processingAshish Kumar
 
Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2 Rumah Belajar
 
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...Koteswar Rao Jerripothula
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation pptGichelle Amon
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Particle image velocimetry
Particle image velocimetryParticle image velocimetry
Particle image velocimetryMohsin Siddique
 
Automatic Relative Radiometric Normalization for Change Detection of Satellit...
Automatic Relative Radiometric Normalization for Change Detection of Satellit...Automatic Relative Radiometric Normalization for Change Detection of Satellit...
Automatic Relative Radiometric Normalization for Change Detection of Satellit...IDES Editor
 
Image Denoising Using Earth Mover's Distance and Local Histograms
Image Denoising Using Earth Mover's Distance and Local HistogramsImage Denoising Using Earth Mover's Distance and Local Histograms
Image Denoising Using Earth Mover's Distance and Local HistogramsCSCJournals
 
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...journal ijrtem
 
Remote Sensing:. Image Filtering
Remote Sensing:. Image FilteringRemote Sensing:. Image Filtering
Remote Sensing:. Image FilteringKamlesh Kumar
 

What's hot (20)

Segmentation
SegmentationSegmentation
Segmentation
 
Region duplication forgery detection in digital images
Region duplication forgery detection  in digital imagesRegion duplication forgery detection  in digital images
Region duplication forgery detection in digital images
 
Image processing
Image processingImage processing
Image processing
 
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGES
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGESIMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGES
IMAGE SEGMENTATION BY USING THRESHOLDING TECHNIQUES FOR MEDICAL IMAGES
 
A Novel Color Image Fusion for Multi Sensor Night Vision Images
A Novel Color Image Fusion for Multi Sensor Night Vision ImagesA Novel Color Image Fusion for Multi Sensor Night Vision Images
A Novel Color Image Fusion for Multi Sensor Night Vision Images
 
Pan sharpening
Pan sharpeningPan sharpening
Pan sharpening
 
Digital image processing short quesstion answers
Digital image processing short quesstion answersDigital image processing short quesstion answers
Digital image processing short quesstion answers
 
Current issues - Signal & Image Processing: An International Journal (SIPIJ)
Current issues - Signal & Image Processing: An International Journal (SIPIJ)Current issues - Signal & Image Processing: An International Journal (SIPIJ)
Current issues - Signal & Image Processing: An International Journal (SIPIJ)
 
Digital image processing and interpretation
Digital image processing and interpretationDigital image processing and interpretation
Digital image processing and interpretation
 
Image pre processing
Image pre processingImage pre processing
Image pre processing
 
Image segmentation 2
Image segmentation 2 Image segmentation 2
Image segmentation 2
 
Av4301248253
Av4301248253Av4301248253
Av4301248253
 
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...
[PDF] Automatic Image Co-segmentation Using Geometric Mean Saliency (Top 10% ...
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Particle image velocimetry
Particle image velocimetryParticle image velocimetry
Particle image velocimetry
 
Automatic Relative Radiometric Normalization for Change Detection of Satellit...
Automatic Relative Radiometric Normalization for Change Detection of Satellit...Automatic Relative Radiometric Normalization for Change Detection of Satellit...
Automatic Relative Radiometric Normalization for Change Detection of Satellit...
 
Image Denoising Using Earth Mover's Distance and Local Histograms
Image Denoising Using Earth Mover's Distance and Local HistogramsImage Denoising Using Earth Mover's Distance and Local Histograms
Image Denoising Using Earth Mover's Distance and Local Histograms
 
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...
PAN Sharpening of Remotely Sensed Images using Undecimated Multiresolution De...
 
Remote Sensing:. Image Filtering
Remote Sensing:. Image FilteringRemote Sensing:. Image Filtering
Remote Sensing:. Image Filtering
 

Viewers also liked

Viewers also liked (17)

Easy contributable internationalization process with Sphinx @ pyconmy2015
Easy contributable internationalization process with Sphinx @ pyconmy2015Easy contributable internationalization process with Sphinx @ pyconmy2015
Easy contributable internationalization process with Sphinx @ pyconmy2015
 
Modern Library Ladders
Modern Library LaddersModern Library Ladders
Modern Library Ladders
 
Penzijni pripojisteni axa
Penzijni pripojisteni axaPenzijni pripojisteni axa
Penzijni pripojisteni axa
 
имена
именаимена
имена
 
Surrealismo
SurrealismoSurrealismo
Surrealismo
 
Predcasny odchod do duchodu
Predcasny odchod do duchoduPredcasny odchod do duchodu
Predcasny odchod do duchodu
 
Matthew 12 commentary
Matthew 12 commentaryMatthew 12 commentary
Matthew 12 commentary
 
Prace pro duchodce
Prace pro duchodcePrace pro duchodce
Prace pro duchodce
 
Halaman awal
Halaman awalHalaman awal
Halaman awal
 
IODD青森2016ワークショップ「市町村データを比べてみよう」ツール
IODD青森2016ワークショップ「市町村データを比べてみよう」ツールIODD青森2016ワークショップ「市町村データを比べてみよう」ツール
IODD青森2016ワークショップ「市町村データを比べてみよう」ツール
 
Cultivating Soft Skills for Workforce Development
Cultivating Soft Skills for Workforce DevelopmentCultivating Soft Skills for Workforce Development
Cultivating Soft Skills for Workforce Development
 
107176512 case-2
107176512 case-2107176512 case-2
107176512 case-2
 
NCAT_paper3_costs
NCAT_paper3_costsNCAT_paper3_costs
NCAT_paper3_costs
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
Julia
JuliaJulia
Julia
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
 
Deep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeDeep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go Home
 

Similar to Kevin_Park_OSU_ Master_Project Report

Denoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodDenoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodIJMER
 
OBIA on Coastal Landform Based on Structure Tensor
OBIA on Coastal Landform Based on Structure Tensor OBIA on Coastal Landform Based on Structure Tensor
OBIA on Coastal Landform Based on Structure Tensor csandit
 
Iris Localization - a Biometric Approach Referring Daugman's Algorithm
Iris Localization - a Biometric Approach Referring Daugman's AlgorithmIris Localization - a Biometric Approach Referring Daugman's Algorithm
Iris Localization - a Biometric Approach Referring Daugman's AlgorithmEditor IJCATR
 
A Case Study : Circle Detection Using Circular Hough Transform
A Case Study : Circle Detection Using Circular Hough TransformA Case Study : Circle Detection Using Circular Hough Transform
A Case Study : Circle Detection Using Circular Hough TransformIJSRED
 
Gupte - first year paper_approved (1)
Gupte - first year paper_approved (1)Gupte - first year paper_approved (1)
Gupte - first year paper_approved (1)Shweta Gupte
 
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...IOSR Journals
 
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...CSCJournals
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTIONijcses
 
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image FusionLocal Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusionsipij
 
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image FusionLocal Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusionsipij
 
Algorithm for the Comparison of Different Types of First Order Edge Detection...
Algorithm for the Comparison of Different Types of First Order Edge Detection...Algorithm for the Comparison of Different Types of First Order Edge Detection...
Algorithm for the Comparison of Different Types of First Order Edge Detection...IOSR Journals
 
REU-Airborn Toxins paper
REU-Airborn Toxins paperREU-Airborn Toxins paper
REU-Airborn Toxins paperSihan Chen
 
Boosting ced using robust orientation estimation
Boosting ced using robust orientation estimationBoosting ced using robust orientation estimation
Boosting ced using robust orientation estimationijma
 
Satellite image compression technique
Satellite image compression techniqueSatellite image compression technique
Satellite image compression techniqueacijjournal
 

Similar to Kevin_Park_OSU_ Master_Project Report (20)

Denoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodDenoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using Sobelmethod
 
OBIA on Coastal Landform Based on Structure Tensor
OBIA on Coastal Landform Based on Structure Tensor OBIA on Coastal Landform Based on Structure Tensor
OBIA on Coastal Landform Based on Structure Tensor
 
Iris Localization - a Biometric Approach Referring Daugman's Algorithm
Iris Localization - a Biometric Approach Referring Daugman's AlgorithmIris Localization - a Biometric Approach Referring Daugman's Algorithm
Iris Localization - a Biometric Approach Referring Daugman's Algorithm
 
poster09
poster09poster09
poster09
 
A Case Study : Circle Detection Using Circular Hough Transform
A Case Study : Circle Detection Using Circular Hough TransformA Case Study : Circle Detection Using Circular Hough Transform
A Case Study : Circle Detection Using Circular Hough Transform
 
Gupte - first year paper_approved (1)
Gupte - first year paper_approved (1)Gupte - first year paper_approved (1)
Gupte - first year paper_approved (1)
 
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...
2-Dimensional Wavelet pre-processing to extract IC-Pin information for disarr...
 
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...
Invariant Recognition of Rectangular Biscuits with Fuzzy Moment Descriptors, ...
 
channel_mzhazbay.pdf
channel_mzhazbay.pdfchannel_mzhazbay.pdf
channel_mzhazbay.pdf
 
1998278
19982781998278
1998278
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
 
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image FusionLocal Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
 
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image FusionLocal Distance and Dempster-Dhafer for Multi-Focus Image Fusion
Local Distance and Dempster-Dhafer for Multi-Focus Image Fusion
 
Algorithm for the Comparison of Different Types of First Order Edge Detection...
Algorithm for the Comparison of Different Types of First Order Edge Detection...Algorithm for the Comparison of Different Types of First Order Edge Detection...
Algorithm for the Comparison of Different Types of First Order Edge Detection...
 
A010110104
A010110104A010110104
A010110104
 
Ed34785790
Ed34785790Ed34785790
Ed34785790
 
REU-Airborn Toxins paper
REU-Airborn Toxins paperREU-Airborn Toxins paper
REU-Airborn Toxins paper
 
[PPT]
[PPT][PPT]
[PPT]
 
Boosting ced using robust orientation estimation
Boosting ced using robust orientation estimationBoosting ced using robust orientation estimation
Boosting ced using robust orientation estimation
 
Satellite image compression technique
Satellite image compression techniqueSatellite image compression technique
Satellite image compression technique
 

Kevin_Park_OSU_ Master_Project Report

  • 1. Image Feature Extraction for Plankton Classification An exploration of image feature extraction and classification on large oceanographic data June, 3 2015 Author KEVIN PARK Oregon State University Corvallis KHP 2015 KIDDER HALL PRESS
  • 2. Contents 1 Introduction 3 2 Edge detection 3 3 Feature Extraction 7 3.1 Shape Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Histogram Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Classification 14 4.1 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5 Results & Discussion 15 6 Future Work 16 7 References 16
  • 3. 1 Introduction Plankton, perhaps surprisingly, form a critical link in the global ecosystem and are a fun- damental source of food and energy for aquatic wildlife. As such, the population levels of plankton are an ideal metric for determining the health and viability of oceans and aquatic ecosystems. The challenge thus becomes determining the best way to classify and count the multitude of phytoplankton and zooplankton species from a sample of ocean water. Modern imaging systems can easily produce hundreds of thousands of images in a short time scale, so using human based means is daunting and often of minimal utility. An open competition called the National Data Science Bowl was hosted by Booz|Allen| Hamilton and Kaggle. A training set of over 30,337 images of plankton were labeled (121 classes) and a test set of 130,400 images of plankton were provided by Hatfield Marine Science Center at Oregon State University. The goal of this competition was to find an algorithm that can properly classify the different species of plankton. In addition, there was a substantial prize to three teams that built the best algorithm. The competition ended in March, but it still allows individuals to submit their results from their algorithms to see how they would have performed if they had participated in the competition. The goal for my project is to properly construct a contour of the plankton to be able to extract geometric properties and how it distinguishes the different species of plankton. 2 Edge detection The images of the different species of the plankton vary in shape and intensity and as a result creates a challenge in producing an edge that captures the shape of the plankton. We found that by using a combination of popular edge detection methods, the Canny, Roberts, and Sobel. It provides us more details than using only one method. 3
  • 4. Roberts Edge Detection The Roberts edge detection algorithm was developed in 1963[1] and at the time it was difficult to implement and not widely used because the lack of computing availability. As computation power increased as did it’s popularity in edge detection. The idea behind this method involves only a few steps. The first is computing the gradient of the original image at each pixel, by convolving with the following kernels. Gx =    +1 0 0 −1    and Gy =    +1 0 0 −1    . The magnitude of the gradient is computed for each pixel, I(x, y) = G(x, y) = G2 x + G2 y. and the gradient direction, Θ(x, y) = arctan Gy Gx . The two results are then combined to produce what is called the Roberts Edges. The disadvantage to Roberts edge detection is the sensitivity to noise. Sobel Edge Detection The Sobel edge detection is similar to the Roberts method. The only difference is the kernel used to produce the gradient, Gx =       −1 0 1 −2 0 2 −1 0 1       and Gy =       −1 −2 −1 0 0 0 1 2 1       . Similarly to Roberts, the Sobel is also sensitive to noise. 4
  • 5. Canny Edge Detection The most popular algorithm of edge detection is the Canny algorithm, it was developed in 1986[3]. The difference between the Roberts and Sobel, is the additional use of denoising (smoothing) the image prior to edge formation and after eliminating any isolated edges. It has been shown under most conditions that the Canny algorithm performed better compared to Roberts, Sobel, and other methods. Currently, it is the benchmark standard in comparing new edge detection[2]. The Canny algorithm method is completed in five steps. 1. First we denoise the image with a Gaussian filter with some fixed parameter (σ). 2. Compute the gradient of the image, with Gx =    −1 +1 −1 +1    and Gy =    +1 +1 −1 −1    . 3. Determine the local maxima as edges. 4. Eliminate any false edges such as isolated pixels. 5. Finally, fill gaps between edges by thresholding. We vary the smoothing parameter (σ) from 1.5 to 3 with increments of 0.5 (∆σ = 0.5) and after compute the mean of the edges produced. Combining edge detectors The edges produced by Roberts and Sobel were combined with the edges from the Canny algorithm. The combination of these methods provided accurate shapes for both simple and complex planktons. 5
  • 6. Figure 1: The above are the original images of the plankton species, Acantharia Protist (A. Protist), Decapods, Detritus Blob (D. Blob), and Trichodesmium Bowtie (T. Bowtie) represent with the edges produced by taking the mean from the Canny edge algorithm. Figure 2: These edges are produced from combining Canny and Roberts. 6
  • 7. Figure 3: These edges produced from combining Canny and Sobel. Although difficult to see the edges produced by combination of Canny and Roberts and Canny and Sobel are different from each other. 3 Feature Extraction 3.1 Shape Analysis After forming an edge for each image plankton, we extracted several geometric properties. For each geometric feature there is an accompanying figure of (Figures 3-9) four species (A. Protist, Decapods, D. Blob, and T. Bowtie) to demonstrate the similarities and differences of their distributions. The distributions on the left represent the features extracted from combined edges of Canny and Sobel and the right are combined edges of Canny and Roberts. 3.1.1 Area The area is computed by counting the number of pixels that make up the object. 7
  • 8. Figure 4: The distribution of Area between Decapods and D. Blob are different. However, there is little difference between A. Protist and T. Bowtie. This method in computing the Area of the object is not the scale or rotation invariant. 3.1.2 Perimeter The perimeter is computed by calculating the number of pixels that surround the object. Figure 5: The distribution perimeter does not seem to differ too much from each other besides D. Blob. This is most likely do the image size. This method of computing the perimeter is not scale or rotation invariant. 8
  • 9. 3.1.3 Major and Minor Axes The Major Axis is the longest straight line in the object and the Minor Axis is the longest straight line perpendicular to the Major Axis. Figure 6: The distribution for the Major Axis length seems to be different between A. Protist and T. Bowtie. Figure 7: The distribution for the Minor Axis length, is different between A. Protist and D. Blob. The Major and Minor Axis lengths are rotation invariant, but not scale invariant. 9
  • 10. 3.1.4 Convexity The convexity is defined as the ratio of the perimeter of the convex hull to the perimeter of the original image. Convexity = Perimeter of Convex Hull Perimeter of the Image . Convex hull is a method of fitting the smallest convex region that surrounds the original image. If the Convexity ≈ 1, then the object is convex. Figure 8: Compared to the previous methods, the distribution of convexity between these four species appear to be more distinct. 3.1.5 Compactness Compactness gives us information on how circular the object is and is computed from the Area and Perimeter of the image. Compactness = 4πArea Perimeter2 If the compactness is closer to 1, then the object is more circular. For example if the Area= πr2 and Perimeter= 2πr, then compactness is 1. 10
  • 11. Figure 9: Similar to the previous values, it appears the A. Protist and T. Bowtie are the most different. 3.1.6 Eccentricity The eccentricity is computed by first fitting an ellipsoid around the object and taking the ratio of the length of the major axis to length of the minor axis, Eccentricity = Length of Major Axis Length of Minor Axis . Figure 10: The distribution of Eccentricity for Decapods seem to differ from the other three planktons. 11
  • 12. 3.1.7 Solidity Solidity measures if the object is concave or convex. This is done by taking the ratio of the original object over the area of the fitted convex hull. Solidity = Area of the Shape Area of the Convex Hull . The solidity of a convex object is 1. Figure 11: The distribution of Solidity is different between A. Protist and D. Blob, while the distribution for Decapods and T. Bowtie are similar. 3.1.8 Mean Curvature We computed the curvature by segmenting the boundary for each object. On the set points that make up each segment, a 2nd degree polynomial was fitted and the coefficient of quadratic term was used to measure curvature. The mean curvature was computed by taking the mean of all the coefficients. 12
  • 13. Figure 12: The distribution of mean curvature between the four planktons appear to be similar. 3.1.9 Skeleton A skeleton is fitted inside the object and then the number of branching nodes are then counted. Figure 13: The distribution in the number of branch points are the similar between A. Protist and D. Blob. Furthermore, the Decapods and T. Bowties are similar. Interestingly, these two groups are different. 13
  • 14. 3.2 Histogram Method The first feature extraction method we developed, was measuring the distribution of grayscale values that make up the shape and texture of each plankton. The grayscale scaled between 0 and 1, where 1 is white and 0 is black. A count is made for the grayscale values that fall between the intervals of [0, 0.1), [0.1, 0.2), . . . , [0.9, 1). Figure 14: The grayscale distributions on the right correspond to the species selected on the left. The species of plankton are, Copepod Calanoid, Echinoderm Larva Pluteus Early, Ctenophore Cydippid no tentacles, and Jellies Tentacles. We have that certain species of plankton the distribution of grayscale values are differ- ent from each other. While there are a few species of plankton that have nearly identical distributions. 4 Classification 4.1 Random Forest After extracting the features from the Canny & Sobel and Canny & Roberts. A random forest model was fitted on the training set and with a different number of trees from 500 to 6000. The models performance was then evaluated on the test set. 14
  • 15. 5 Results & Discussion The score for plankton classification was measured with a multi-class logarithmic loss. For each image in the test set there is a true class label. The formula used is, logloss = − 1 N N i=1 M j=1 yij log(pij). Where N is the number of images in the test set and M is the number of class labels. The score is ”conveniently” evaluated on Kaggle’s website. If we were to randomly guess (with equal probability) the class of each image (1 of the 121) we would obtain a benchmark score of 4.795791. Our random forest model with the 21 features performed better than the benchmark score and as we increased the number of trees the score improved and converged around 1.77. The features extracted using the Canny & Sobel performed better than the Canny & Roberts edges. We conclude the intensity values and geometric properties of plankton distinguish the different species of planktons. 15
  • 16. 6 Future Work The proper fitting of a classification model was not fully explored and should be. There are plenty of room for improving the current feature extraction and extract additional geometric features. From our results we found that the combinations of edge detecting methods produced the desired results. A possibility is to develop an algorithm that creates the quality edges from the combined Sobel and Canny. We can focus on accurately representing some of the features extracted such as the mean curvatures. The method we used just fitted a second degree polynomial and we could explore other types of fitting such splines to measure the curvature. We can extract additional geometric features that are described in A survey of Shape Feature Extraction Techniques [3]. There was no tuning for the random forest model and the only parameter we varied was the number of trees. We can further explore how the model does if other parameters are changed, such as the number of nodes and pruning the trees. Additionally, we can fit a random forest model by boosting instead of bagging (which we fitted.) 7 References 1. R. Maini and H. Aggarwal. Study and Comparison of Various Image Edge Detection Techniques. CSC journals. vol. 3, 1-60, 2009. 2. J. Canny, A computational approach to edge detection. IEEE Trans. Pattern Analysis Machine Intell., vol. 8, 679-697, 1986 3. M. Yang, K. Kpalma, and J. Ronsin. A Survey of Shape Feature Extraction Techniques. Peng-Yeng Yin. Pattern Recognition. IN-TECH, pp. 43-90,2008. 16
  • 17. 4. C. Cheng, W. Liu, and H. Zhang, Image retrieval based on region shape similarity, in 13th SPIE symposium on Electronic Imaging, Storage, and Retrieval for Image and Video Databases, 2001. 5. M. Peura and J. Iivarinen, Efficiency of Simple shape descriptors. Proc. 3rd Interna- tional Workshop on Visual Form (IWVF3), May 1997. 17