The document discusses random forest, an ensemble machine learning technique. It explains that random forest creates multiple decision trees from random subsets of a training dataset and aggregates their predictions. It provides an analogy where asking multiple friends for a movie recommendation is like random forest getting predictions from several decision trees to improve accuracy over a single decision tree. The document also covers advantages of random forest like robustness to overfitting, speed and scalability, and generally better performance than single algorithms. It concludes with points about model size, training speed, and properly setting randomization parameters.
2. www.edureka.co/data-science
What will you learn today?
What are Ensemble techniques?
Introduction to Random Forest
Understanding Random Forest Algorithm
Hands On – Applying Random Forest
4. www.edureka.co/data-science
Random Forest
Random forest is an ensemble of decision
trees where training (sample) dataset is
recursively partitioned into different decision
trees based on value of a parameter.
Random Forest is one of the most popular
ensemble technique used in Data Science
Both R and Python provides packages for Random Forest implementation
5. www.edureka.co/data-science
Understanding Random Forest
Suppose you're very indecisive about
watching a movie.
“Edge of Tomorrow”
You can do one of the following :
1. Either you ask your best friend,
whether you will like the movie.
2. Or You can ask your group of friends.
6. www.edureka.co/data-science
Understanding Random Forest
In order to answer, your best friend first needs
to figure out what movies you like, so you give
her a bunch of movies and tell her whether you
liked each one or not (i.e., you give her a
labelled training set)
Example:
Do you like movies starring Emily Blunt ?
Ask
Best
Friend
Is it based on a
true incident?
Does Emily Blunt
star in it?
No
Is she the
main lead?
Yes, You will
like the movie
No Yes
No, You will
not like the
movie
No, You will not
like the movie
7. www.edureka.co/data-science
Understanding Random Forest
But your best friend might not always generalize your
preferences very well (i.e., she overfits)
In order to get more accurate recommendations, you'd like
to ask a bunch of your friends e.g. Friend#1, Friend#2, and
Friend#3 and they vote on whether you will like a movie
The majority of the votes will decide the final outcome
8. www.edureka.co/data-science
Understanding Random Forest
You didn’t
like ‘Far and
away’
You liked
‘Oblivion’
You like action
movies
You like Tom
Cruise
You like his
pairing with
Emily Blunt
Yes, You will like
the movie
Yes, You will
like the movie
Yes, You will
like the
movie
Friend 2
You did not
like ‘Top Gun’
You loved
‘Godzilla’
Friend 1
No, You will
not like the
movie
Yes, You will
like the
movie
You hate Tom
Cruise
Friend 3
No, You will not
like the movie
10. www.edureka.co/data-science
Random Forest Advantages
It is robust against over fitting.
It is fast and scalable.
It gives better results with the increasing number of examples.
It might be used for clustering, statistical inference and feature selection as well.
Ensemble methods (i.e. RF) generally outperform single algorithms.
11. www.edureka.co/data-science
Points to watch out for while using RF
The models tend to be very large.
They are slower to train.
Its randomization parameters needs to be set well(Selection of nodes, number of trees, randomization of
instance variables).
12. www.edureka.co/data-science
Survey
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to
make your experience better!
Please spare few minutes to take the survey after the webinar.