Machine Learning Workshop

•Télécharger en tant que PPTX, PDF•

0 j'aime•785 vues

This document provides an overview of decision trees and random forests machine learning algorithms. It discusses how decision trees partition data to make predictions and how random forests address overfitting by creating an ensemble of decorrelated trees formed by bootstrap sampling and randomly selecting attributes at each split. The document demonstrates implementing decision trees and random forests in R on a wine dataset and compares their performance, with random forests achieving higher accuracy. It also discusses tuning random forests via grid search.

Formation

Hands on Classification:
Decision Trees and Random
Forests
Predictive Analytics Meetup Group
Machine Learning Workshop
December 2, 2012

Daniel Gerlanc, Managing Director
Enplus Advisors, Inc.
www.enplusadvisors.com
dgerlanc@enplusadvisors.com

© Daniel Gerlanc, 2012.
All rights reserved.

If you‟d like to use this material for any
purpose, please contact
dgerlanc@enplusadvisors.com

What You‟ll Learn

• Intuition behind decision trees and
random forests
• Implementation in R
• Assessing the results

Dataset

• Chemical Analysis of Italian Wines
• http://www.parvus.unige.it/
• 178 records, 14 attributes

Follow along
> library(mlclass)
> data(wine)
> str(wine)
'data.frame': 178 obs. of 14 variables:
$ Type : Factor w/ 2 levels "Grig","No": 2 2 2 2 2 2 2 2 2 2 ...
$ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
$ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
$ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
$ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...

What are Decision
Trees?

• Model for partitioning an input space

Create the 1 st split.

Not G

G

See rf-1.R

Create the 2 nd Split

Not G

G

G

See rf-1.R

Create more splits…

Not G

G

Not G
G

I drew this one in.

Another view of partitioning

See rf-2.R

Use R to do the partitioning.

tree.1 <- rpart(Type ~ ., data=wine)
prp(tree.1, type=4, extra=2)

• See the „rpart‟ and „rpart.plot‟ R packages.
• Many parameters available to control the fit.

See rf-2.R

Make predictions on a test dataset

predict(tree.1, data=wine, type=“vector”)

How‟d it do?
Guessing: 60.11%
CART: 94.38% Accuracy
• Precision: 92.95% (66 / 71)
• Sensitivity/Recall: 92.95% (66 / 71)

Actual

Predicted Grig no

Grig (1) 66 (3) 5

No (2) 5 (4) 102

Decision Tree
Problems

• Overfitting the data
• May not use all relevant features
• Perpendicular decision boundaries

Random Forests

One Decision
Tree

Many Decision
Trees (Ensemble)

Random Forest Fixes

• Overfitting the data
• May not use all relevant features
• Perpendicular decision boundaries

Building RF

For each tree:
Sample from the data
At each split, sample from the available
variables

Motivations for RF

• Create uncorrelated trees
• Variance reduction
• Subspace exploration

Random Forests
rffit.1 <- randomForest(Type ~ ., data=wine)

See rf-3.R

RF Parameters in R
Most important parameters are:

Variable Description Default

ntree Number of Trees 500

mtry Number of variables to randomly • square root of # predictors for
select at each node classification
• # predictors / 3 for regression
nodesize Minimum number of records in a • 1 for classification
terminal node • 5 for regression

sampsize Number of records to select in each • 63.2%
bootstrap sample

How‟d it do?
Guessing Accuracy: 60.11%
Random Forest: 98.31% Accuracy
• Precision: 95.77% (68 / 71)
• Sensitivity/Recall: 100% (68 / 68)

Actual

Predicted Grig No

Grig (1) 68 (3) 3

No (2) 0 (4) 107

Tuning RF: Grid Search
This is the default.

See rf-4.R

Benefits of RF

• Good performance with default settings
• Relatively easy to make parallel
• Many implementations
• R, Weka, RapidMiner, Mahout

References

• A. Liaw and M. Wiener (2002). Classification and Regression by
randomForest. R News 2(3), 18--22.

• Breiman, Leo. Classification and Regression Trees. Belmont, Calif:
Wadsworth International Group, 1984. Print.

• Brieman, Leo and Adele Cutler. Random forests.
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht
m

Contenu connexe

En vedette

Chahal’s photographyGågån Chåhål

05999528farouk boumehrez

uso de internetjaquelinne yoanna ruiz achury

Final media evaluation Jack Street

Preschool GardenAmy Beard

11 things i wish i had learned final presentationbehnbrian

บทที่ 9 แต่งเติมเว็บเพจด้วยกราฟิกNattipong Siangyen

Creating Discounts & Promotions with Hitachi Solutions EcommerceHitachi Solutions America, Ltd.

En vedette (8)

Chahal’s photography

05999528

uso de internet

Final media evaluation

Preschool Garden

11 things i wish i had learned final presentation

บทที่ 9 แต่งเติมเว็บเพจด้วยกราฟิก

Creating Discounts & Promotions with Hitachi Solutions Ecommerce

Similaire à Machine Learning Workshop

Predicting Customer Conversion with Random ForestsEnplus Advisors, Inc.

Random Forests Lightning TalkEnplus Advisors, Inc.

Cluster analysis using Rapidminer and SasMadhumita Ghosh

4 1 tree worldLeonardo Auslender

From decision trees to random forestsViet-Trung TRAN

Decision treeVarun Jain

Machine Learning Decision Tree AlgorithmsRupak Roy

TreeHuggr: Discovering Where Tree-based Classifiers are Vulnerable to Adversa...Bobby Filar

4_1_Tree World.pdfLeonardo Auslender

Practice discovering biological knowledge using networks approach.Elena Sügis

NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN

Adam Ashenfelter - Finding the OddballsMachine Learning Prague

Self healing dataUwe Friedrichsen

PT-4054, "OpenCL™ Accelerated Compute Libraries" by John MelonakosAMD Developer Central

RNA-Seq_analysis_course(2).pptxBiancaMoreira45

Gradient Boosted Regression Trees in scikit-learnDataRobot

Nanometer Testing: Challenges and SolutionsDVClub

Abraham q3 2008Obsidian Software

Connected Components LabelingHemanth Kumar Mantri

Approximation Data Structures for Streaming ApplicationsDebasish Ghosh

Similaire à Machine Learning Workshop (20)

Predicting Customer Conversion with Random Forests

Random Forests Lightning Talk

Cluster analysis using Rapidminer and Sas

4 1 tree world

From decision trees to random forests

Decision tree

Machine Learning Decision Tree Algorithms

TreeHuggr: Discovering Where Tree-based Classifiers are Vulnerable to Adversa...

4_1_Tree World.pdf

Practice discovering biological knowledge using networks approach.

NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...

Adam Ashenfelter - Finding the Oddballs

Self healing data

PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos

RNA-Seq_analysis_course(2).pptx

Gradient Boosted Regression Trees in scikit-learn

Nanometer Testing: Challenges and Solutions

Abraham q3 2008

Connected Components Labeling

Approximation Data Structures for Streaming Applications

Dernier

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva

Single or Multiple melodic lines structuredhanjurrannsibayan2

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade

Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid

REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01

How to setup Pycharm environment for Odoo 17.pptxCeline George

How to Give a Domain for a Field in Odoo 17Celine George

ICT role in 21st century education and it's challenges.MaryamAhmad92

Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh

Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University of Engineering & Technology, Jamshoro

How to Create and Manage Wizard in Odoo 17Celine George

Sociology 101 Demonstration of Learning Exhibitjbellavia9

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva

Salient Features of India constitution especially power and functionsKarakKing

Accessible Digital Futures project (20/03/2024)Jisc

Application orientated numerical on hev.pptRamjanShidvankar

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva

Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1

Dernier (20)

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...

Single or Multiple melodic lines structure

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

REMIFENTANIL: An Ultra short acting opioid.pptx

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx

How to setup Pycharm environment for Odoo 17.pptx

How to Give a Domain for a Field in Odoo 17

ICT role in 21st century education and it's challenges.

Micro-Scholarship, What it is, How can it help me.pdf

Mehran University Newsletter Vol-X, Issue-I, 2024

How to Create and Manage Wizard in Odoo 17

Sociology 101 Demonstration of Learning Exhibit

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...

Salient Features of India constitution especially power and functions

Accessible Digital Futures project (20/03/2024)

Application orientated numerical on hev.ppt

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx

Plant propagation: Sexual and Asexual propapagation.pptx

Machine Learning Workshop

1. Hands on Classification: Decision Trees and Random Forests Predictive Analytics Meetup Group Machine Learning Workshop December 2, 2012 Daniel Gerlanc, Managing Director Enplus Advisors, Inc. www.enplusadvisors.com dgerlanc@enplusadvisors.com

3. What You‟ll Learn • Intuition behind decision trees and random forests • Implementation in R • Assessing the results

4. Dataset • Chemical Analysis of Italian Wines • http://www.parvus.unige.it/ • 178 records, 14 attributes

5. Follow along > library(mlclass) > data(wine) > str(wine) 'data.frame': 178 obs. of 14 variables: $ Type : Factor w/ 2 levels "Grig","No": 2 2 2 2 2 2 2 2 2 2 ... $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ... $ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...

6. What are Decision Trees? • Model for partitioning an input space

7. What‟s partitioning? See rf-1.R

8. Create the 1 st split. Not G G See rf-1.R

9. Create the 2 nd Split Not G G G See rf-1.R

10. Create more splits… Not G G Not G G I drew this one in.

11. Another view of partitioning See rf-2.R

12. Use R to do the partitioning. tree.1 <- rpart(Type ~ ., data=wine) prp(tree.1, type=4, extra=2) • See the „rpart‟ and „rpart.plot‟ R packages. • Many parameters available to control the fit. See rf-2.R

13. Make predictions on a test dataset predict(tree.1, data=wine, type=“vector”)

14. How‟d it do? Guessing: 60.11% CART: 94.38% Accuracy • Precision: 92.95% (66 / 71) • Sensitivity/Recall: 92.95% (66 / 71) Actual Predicted Grig no Grig (1) 66 (3) 5 No (2) 5 (4) 102

15. Decision Tree Problems • Overfitting the data • May not use all relevant features • Perpendicular decision boundaries

16. Random Forests One Decision Tree Many Decision Trees (Ensemble)

17. Random Forest Fixes • Overfitting the data • May not use all relevant features • Perpendicular decision boundaries

18. Building RF For each tree: Sample from the data At each split, sample from the available variables

19. Bootstrap Sampling

20. Sample Attributes at each split

21. Motivations for RF • Create uncorrelated trees • Variance reduction • Subspace exploration

22. Random Forests rffit.1 <- randomForest(Type ~ ., data=wine) See rf-3.R

23. RF Parameters in R Most important parameters are: Variable Description Default ntree Number of Trees 500 mtry Number of variables to randomly • square root of # predictors for select at each node classification • # predictors / 3 for regression nodesize Minimum number of records in a • 1 for classification terminal node • 5 for regression sampsize Number of records to select in each • 63.2% bootstrap sample

24. How‟d it do? Guessing Accuracy: 60.11% Random Forest: 98.31% Accuracy • Precision: 95.77% (68 / 71) • Sensitivity/Recall: 100% (68 / 68) Actual Predicted Grig No Grig (1) 68 (3) 3 No (2) 0 (4) 107

25. Tuning RF: Grid Search This is the default. See rf-4.R

26. Tuning is Expensive

27. Benefits of RF • Good performance with default settings • Relatively easy to make parallel • Many implementations • R, Weka, RapidMiner, Mahout

28. References • A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22. • Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print. • Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.ht m

Notes de l'éditeur

John, Dave, and I have spoken a bit about the motivations for using Machine Learning techniques.
John, Dave, and I have spoken a bit about the motivations for using Machine Learning techniques.

Machine Learning Workshop

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (8)

Similaire à Machine Learning Workshop

Similaire à Machine Learning Workshop (20)

Dernier

Dernier (20)

Machine Learning Workshop

Notes de l'éditeur