SlideShare une entreprise Scribd logo
1  sur  83
Télécharger pour lire hors ligne
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Unsupervised Learning Techniques to Diversifying
and Pruning Random Forest
Dr Mohamed Medhat Gaber
School of Computing Science and Digital Media
Robert Gordon University
27 January 2015
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Acknowledgement
Work done in collaboration with PhD student Khaled Fawagreh
and co-supervisor Dr Eyad Elyan
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
1 Background
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
2 Clustering and Ensemble Diversity
CLUB-DRF
Experimental Study
3 Outlier Scoring and Ensemble Diversity
LOFB-DRF
Experimental Study
4 Summary and Future Work
Summary
Future Work
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressing
the data classification process (e.g., decision trees, artificial
neural networks, and support vector machine).
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressing
the data classification process (e.g., decision trees, artificial
neural networks, and support vector machine).
Predictive accuracy has been the major concern when
designing a new classification technique, followed by time
needed for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
The first attribute which is called the root is the best
attribute (according to some goodness measure) to spit on.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
The first attribute which is called the root is the best
attribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with finding
the best attribute (attribute = value) to split on at each
iteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Boosting builds classifiers in a sequence encouraging later
built classifiers to be expert in classifying incorrectly classified
instances from previous classifiers in the sequence.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Boosting builds classifiers in a sequence encouraging later
built classifiers to be expert in classifying incorrectly classified
instances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a new
dataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number of
different classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number of
different classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of the
experiment)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of the
experiment)
Random Forest was the first ranked, followed by SVM with
Gaussian kernel
Reference
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D.
(2014). Do we need hundreds of classifiers to solve real world
classification problems?. The Journal of Machine Learning
Research, 15(1), 3133-3181.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Improving Random Forests
Source: Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early
developments to recent advancements. Systems Science & Control Engineering: An
Open Access Journal, 2(1), pp. 602-609.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Inspired by this observation, we hypothesised that if trees in
the Random Forest are clustered, we can use a small subset
(typically one tree) from each cluster to produce a diversified
Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Inspired by this observation, we hypothesised that if trees in
the Random Forest are clustered, we can use a small subset
(typically one tree) from each cluster to produce a diversified
Random Forest
The benefits are two fold
An increased diversification
A smaller ensemble, leading to faster classification of
unlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF
We termed the method CLUster
Based Diversified Random Forests
(CLUB-DRF)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF
We termed the method CLUster
Based Diversified Random Forests
(CLUB-DRF)
Three stages are followed:
A Random Forest is induced
using the traditional method
Trees are clustered according to
their classification pattern
One or more representative are
chosen from each cluster to form
the pruned Random Forest
…....
…....
C(t1, T) C(tn, T)
t1 ……. tn
Parent RF
Training Set
Random Forest Algorithm
Clustering Algorithm
Cluster 1 Cluster k
Representative Selection
Testing Set
t1 ……. tk
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
Random
Best performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
We used one representative tree per cluster based on the Out
Of Bag (OOB) performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
We used one representative tree per cluster based on the Out
Of Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Summarised Results
0
3
6
9
10 20 30 40
Size (Number of Trees)
NumberofDatasets
Method
CLUB−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Local Outlier Factor (LOF) assigns a real number to each
instance to represent its peculiarity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Local Outlier Factor (LOF) assigns a real number to each
instance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverse
ensemble of trees can be formed using outlier detection
method
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Each tree is assigned LOF
value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Each tree is assigned LOF
value
Trees are then chosen
according to two criteria
Predictive accuracy
LOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Options of combining LOF with predictive accuracy
Using LOF only ruling out predictive accuracy
Using a combination strategy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,
15, 20, 25, 30, 35, and 40
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,
15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Summarised Results
0
2
4
6
10 20 30 40
Size (Number of Trees)
NumberofDatasets
Method
LOF−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Results showed the potential of these two methods to further
enhance the predictive accuracy of the method
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Results showed the potential of these two methods to further
enhance the predictive accuracy of the method
The high level of pruning makes these techniques candidates
for real-time applications, as the number of trees to be
traversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Using LOF and predictive accuracy for the choice of tree
representatives in each cluster
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Using LOF and predictive accuracy for the choice of tree
representatives in each cluster
Applying both methods to other ensemble classification
techniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Q & A
Thanks for listening!
Contact Details
Dr Mohamed Medhat Gaber
E-mail: m.gaber1@rgu.ac.uk
Webpage: http://mohamedmgaber.weebly.com/
LinkedIn: https://www.linkedin.com/profile/view?id=21808352
Twitter: https://twitter.com/mmmgaber
ResearchGate:
https://www.researchgate.net/profile/Mohamed Gaber16?ev=prf highl
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest

Contenu connexe

Tendances

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Rupak Roy
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsRupak Roy
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision TreesRupak Roy
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Edureka!
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 

Tendances (20)

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 

En vedette

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010Ted Dunning
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statisticsSahidul Islam
 
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...IT Arena
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
 
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...GeeksLab Odessa
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
Projection In Computer Graphics
Projection In Computer GraphicsProjection In Computer Graphics
Projection In Computer GraphicsSanu Philip
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filteringsscdotopen
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringDKALab
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutData Science London
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 

En vedette (15)

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
 
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Projection In Computer Graphics
Projection In Computer GraphicsProjection In Computer Graphics
Projection In Computer Graphics
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Similaire à Unsupervised Learning Techniques to Diversifying and Pruning Random Forest

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniquesijsrd.com
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.pptHODECE21
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsVidya sagar Sharma
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousingadil raja
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers ijcsa
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learningMohammedWahhab4
 

Similaire à Unsupervised Learning Techniques to Diversifying and Pruning Random Forest (20)

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniques
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
Talk
TalkTalk
Talk
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
G44083642
G44083642G44083642
G44083642
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learning
 

Dernier

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 

Dernier (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 

Unsupervised Learning Techniques to Diversifying and Pruning Random Forest

  • 1. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Unsupervised Learning Techniques to Diversifying and Pruning Random Forest Dr Mohamed Medhat Gaber School of Computing Science and Digital Media Robert Gordon University 27 January 2015 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 2. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Acknowledgement Work done in collaboration with PhD student Khaled Fawagreh and co-supervisor Dr Eyad Elyan Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 3. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work 1 Background Data Classification Ensemble Classification Ensemble Diversity Random Forests 2 Clustering and Ensemble Diversity CLUB-DRF Experimental Study 3 Outlier Scoring and Ensemble Diversity LOFB-DRF Experimental Study 4 Summary and Future Work Summary Future Work Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 4. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 5. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 6. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. A large number of techniques have been proposed addressing the data classification process (e.g., decision trees, artificial neural networks, and support vector machine). Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 7. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. A large number of techniques have been proposed addressing the data classification process (e.g., decision trees, artificial neural networks, and support vector machine). Predictive accuracy has been the major concern when designing a new classification technique, followed by time needed for model construction and usage. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 8. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 9. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 10. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 11. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 12. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) The first attribute which is called the root is the best attribute (according to some goodness measure) to spit on. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 13. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) The first attribute which is called the root is the best attribute (according to some goodness measure) to spit on. An iterative process to build subtrees is followed with finding the best attribute (attribute = value) to split on at each iteration Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 14. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 15. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 16. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 17. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Boosting builds classifiers in a sequence encouraging later built classifiers to be expert in classifying incorrectly classified instances from previous classifiers in the sequence. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 18. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Boosting builds classifiers in a sequence encouraging later built classifiers to be expert in classifying incorrectly classified instances from previous classifiers in the sequence. Stacking uses a hierarchy of classifiers that generates a new dataset for a single classifier to be built. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 19. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 20. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 21. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 22. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 23. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Stacking typically imposes diversity using a number of different classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 24. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Stacking typically imposes diversity using a number of different classifiers Error correcting code manipulates output to create diversity Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 25. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 26. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 27. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 28. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 29. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Typically 100 to 500 trees are used to form the ensemble Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 30. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Typically 100 to 500 trees are used to form the ensemble It is now considered among the best performing classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 31. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 32. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers 121 datasets (the whole UCI repository at the time of the experiment) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 33. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers 121 datasets (the whole UCI repository at the time of the experiment) Random Forest was the first ranked, followed by SVM with Gaussian kernel Reference Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. The Journal of Machine Learning Research, 15(1), 3133-3181. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 34. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Improving Random Forests Source: Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering: An Open Access Journal, 2(1), pp. 602-609. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 35. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 36. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 37. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Inspired by this observation, we hypothesised that if trees in the Random Forest are clustered, we can use a small subset (typically one tree) from each cluster to produce a diversified Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 38. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Inspired by this observation, we hypothesised that if trees in the Random Forest are clustered, we can use a small subset (typically one tree) from each cluster to produce a diversified Random Forest The benefits are two fold An increased diversification A smaller ensemble, leading to faster classification of unlabelled instances Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 39. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF We termed the method CLUster Based Diversified Random Forests (CLUB-DRF) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 40. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF We termed the method CLUster Based Diversified Random Forests (CLUB-DRF) Three stages are followed: A Random Forest is induced using the traditional method Trees are clustered according to their classification pattern One or more representative are chosen from each cluster to form the pruned Random Forest ….... ….... C(t1, T) C(tn, T) t1 ……. tn Parent RF Training Set Random Forest Algorithm Clustering Algorithm Cluster 1 Cluster k Representative Selection Testing Set t1 ……. tk CLUB-DRF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 41. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 42. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 43. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees The number of trees representing each cluster Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 44. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees The number of trees representing each cluster The criteria for choosing the representatives Random Best performing Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 45. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 46. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 47. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 48. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 49. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 We used one representative tree per cluster based on the Out Of Bag (OOB) performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 50. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 We used one representative tree per cluster based on the Out Of Bag (OOB) performance Repeated hold-out method used to estimate the performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 51. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Summarised Results 0 3 6 9 10 20 30 40 Size (Number of Trees) NumberofDatasets Method CLUB−DRF RF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 52. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Pruning Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 53. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Sample of Detailed Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 54. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 55. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 56. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Local Outlier Factor (LOF) assigns a real number to each instance to represent its peculiarity Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 57. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Local Outlier Factor (LOF) assigns a real number to each instance to represent its peculiarity Inspired by this analogy, we hypothesised that a diverse ensemble of trees can be formed using outlier detection method Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 58. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 59. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 60. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Each tree is assigned LOF value Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 61. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Each tree is assigned LOF value Trees are then chosen according to two criteria Predictive accuracy LOF value Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 62. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF Settings A number of settings are needed as follows: LOF setting of the number of nearest neighbours Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 63. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF Settings A number of settings are needed as follows: LOF setting of the number of nearest neighbours Options of combining LOF with predictive accuracy Using LOF only ruling out predictive accuracy Using a combination strategy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 64. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 65. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 66. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 67. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 68. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 69. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives We used the following values for representative trees: 5, 10, 15, 20, 25, 30, 35, and 40 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 70. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives We used the following values for representative trees: 5, 10, 15, 20, 25, 30, 35, and 40 Repeated hold-out method used to estimate the performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 71. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Summarised Results 0 2 4 6 10 20 30 40 Size (Number of Trees) NumberofDatasets Method LOF−DRF RF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 72. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Pruning Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 73. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Sample of Detailed Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 74. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 75. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 76. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Results showed the potential of these two methods to further enhance the predictive accuracy of the method Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 77. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Results showed the potential of these two methods to further enhance the predictive accuracy of the method The high level of pruning makes these techniques candidates for real-time applications, as the number of trees to be traversed are significantly reduced Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 78. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 79. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 80. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 81. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Using LOF and predictive accuracy for the choice of tree representatives in each cluster Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 82. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Using LOF and predictive accuracy for the choice of tree representatives in each cluster Applying both methods to other ensemble classification techniques Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 83. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Q & A Thanks for listening! Contact Details Dr Mohamed Medhat Gaber E-mail: m.gaber1@rgu.ac.uk Webpage: http://mohamedmgaber.weebly.com/ LinkedIn: https://www.linkedin.com/profile/view?id=21808352 Twitter: https://twitter.com/mmmgaber ResearchGate: https://www.researchgate.net/profile/Mohamed Gaber16?ev=prf highl Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest