The document discusses using random forests for skin detection from images. It provides an agenda for a presentation which includes an overview of the skin detection scheme using random forests, a refresher on random forests and R support, and results and continuing work. Code and the dataset are available online. Random forests show the best performance for skin detection compared to other models. The presenter's results are incomplete due to a small training set and they plan to use a parallel computing cluster going forward.
1. Auro Tripathy
auro@shatterline.com
*Random Forests are registered trademarks of Leo Breiman and Adele Cutler
2. Attributions, code and dataset location (1
minute)
Overview of the scheme (2 minutes)
Refresher on Random Forest and R
Support (2 minutes)
Results and continuing work (1 minute)
Q&A (1 minute and later)
4. R code available here; my contribution
http://www.shatterline.com/SkinDetection.html
Data set available here
http://www.feeval.org/Data-sets/Skin_Colors.html
Permission to use may be required
5. All training sets organized as a two-movie
sequence
1. A movies sequence of frames in color
2. A corresponding sequence of frames in binary
black-and-white, the ground-truth
Extract individual frames in jpeg format
using ffmpeg, a transcoding tool
ffmpeg -i 14.avi -f image2 -ss 1.000 -vframes 1
14_500offset10s.jpeg
ffmpeg -i 14_gt_500frames.avi -f image2 -ss 1.000 -vframes 1
14_gt_500frames_offset10s.jpeg
6. Image Ground-truth
The original authors used 8991 such image-pairs, the image along with
its manually annotated pixel-level ground-truth.
7. Attributions, code and dataset location (1
minute)
Overview of the scheme (2 minutes)
Refresher on Random Forest and R
Support (2 minutes)
Results and continuing work (1 minute)
Q&A (1 minute and later)
8. Skin-color classification/segmentation
Uses Improved Hue, Saturation, Luminance
(IHLS) color-space
RBG values transformed to HLS
HLS used as feature-vectors
Original authors also experimented with
Bayesian network,
Multilayer Perceptron,
SVM,
AdaBoost (Adaptive Boosting),
Naive Bayes,
RBF network
“Random Forest shows the best performance in terms of accuracy,
precision and recall”
9. The most important property of this [IHLS] space is a “well-
behaved” saturation coordinate which, in contrast to commonly
used ones, always has a small numerical value for near-
achromatic colours, and is completely independent of the
brightness function
A 3D-polar Coordinate Colour Representation Suitable for
Image, Analysis Allan Hanbury and Jean Serra
MATLAB routines implementing the RGB-to-IHLS and IHLS-to-RGB are
available at http://www.prip.tuwien.ac.at/˜hanbury.
R routines implementing the RGB-to-IHLS and IHLS-to-RGB are
available at http://www.shatterline.com/SkinDetection.html
10. Package „ReadImages‟
This package provides functions for reading
JPEG and PNG files
Package „randomForest‟
Breiman and Cutler‟s Classification and
regression based on a forest of trees using
random inputs.
Package „foreach‟
Support for the foreach looping construct
Stretch goal to use %dopar%
11. set.seed(371)
skin.rf <- foreach(i = c(1:nrow(training.frames.list)), .combine=combine,
.packages='randomForest') %do%
{
#Read the Image
#transform from RGB to IHLS
#Read the corresponding ground-truth image
#data is ready, now apply random forest #not using the formula interface
randomForest(table.data, y=table.truth, mtry = 2, importance = FALSE,
proximity = FALSE, ntree=10, do.trace = 100)
}
table.pred.truth <- predict(skin.rf, test.table.data)
12. Attributions, code and dataset location (1
minute)
Overview of the scheme (2 minutes)
Refresher on Random Forest and R
Support (2 minutes)
Results and continuing work (1 minute)
Q&A (1 minute and later)
13. Have lots of decision-tree learners
Each learner‟s training set is sampled
independently – with replacement
Add more randomness – at each node of
the tree, the splitting attribute is selected
from a randomly chosen sample of
attributes
14. Each decision tree votes
for a classification
Forest chooses a
classification with the
most votes
15. Quick training phase
Trees can grow in parallel
Trees have attractive computing
properties
For example…
Computation cost of making a binary tree is
low O(N Log N)
Cost of using a tree is even lower – O(Log N)
N is the number of data points
Applies to balanced binary trees; decision
trees often not balanced
16. Attributions, code and dataset location (1
minute)
Overview of the scheme (2 minutes)
Refresher on Random Forest and R
Support (2 minutes)
Results and continuing work (1 minute)
Q&A (1 minute and later)
18. Attributions, code and dataset location (1
minute)
Overview of the scheme (2 minutes)
Refresher on Random Forest and R
Support (2 minutes)
Results and continuing work (1 minute)
Q&A (1 minute and later)
Notes de l'éditeur
I’m the opening act before the real show. An opening act or warm-up act (in British English and Australia, supporting act) is an entertainer, musician, band, or entertainment act that performs at a concert before the featured (or headline) entertainer/musician(s). Rarely, an opening act may perform again at the end of the concert.The opening act's performance serves to "warm up" the audience, making it appropriately excited and enthusiastic for the headliner.
How many of you were in the previous MeetUp? Thank the organizers
Original implementation, probably in MATLAB,used in the paper.
R provides libraries to read JPEG – no surprise there
How many of you were in the previous MeetUp? Thank the organizers
Random forest is an ensemble classifier having a quicktraining phase and a very high generalization accuracy [10,11, 12]. It is successfully used in image classification [13],image matching [14], segmentation [15] and gesture recognition[16].
Why do you need the IHLS-to-RGB?
Anyone aware of a color-space conversion library
How many of you were in the previous MeetUp? Thank the organizers
What’s the theory? If we take a large collection of very poor learners (weak learners, in the jargon), each performing only better than chance, then by “putting them together”, it is possible to make an ensemble learner that can perform arbitrarily well.For growing trees, if the number of cases in the trainingset is N, sample N cases at random - but with replacement,from the original data. This sample will be the training setfor growing the tree. If there are M input variables, a numberm <<M is specified such that at each node, m variables areselected at random out of the M and the best split on thesem is used to split the node. The value of m is held constantduring the forest growing. Each tree is grown to the largestextent possible. There is no pruning. For classification, thefinal selection by the forest is based on the maximum votingamong the trees.
For classification, thefinal selection by the forest is based on the maximum votingamong the trees.
How many of you were in the previous MeetUp? Thank the organizers
How many of you were in the previous MeetUp? Thank the organizers