SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Project II
Data Mining a
Mushroom Dataset
Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set                      Number of
                 Multivariate            8124 Area:           Life
Characteristics:              Instances:
Attribute                    Number of           Date
                 Categorical             22               1987
Characteristics:             Attributes:         Donated:

This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
 22 Independent attributes
 1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond             N = none
C = creosote           P = pungent
F = foul               S = spicy
L = anise              Y = fishy
M = musty




           a   c   f   l    m n      p   s   y
J48 Tree 100%                                                     E = Edible
Classification                                                    P = Poisonous



   E       P           P         E          P                 P        P           P
almond creosote    foul      anise        musty   none pungent spicy              fishy


   E      E        E         E             P          E       E                   E

 black   brown    buff chocolate green orange purple white                    yellow


                                                                              E
                            P                             E
                                                              narrow       broad
                           close         crowded distant

          E            P             E            E           E        E
       abundant clustered numerous scattered several               solitary
Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
Habitat Insights
Waste is safe but stay away from paths




Woods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights
  Mushrooms travel safer in groups




Abundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge

         Population Data                                        %Rates vs. Mushrooms
                                                           120.00%

                                                           100.00%

                                                            80.00%

                                                            60.00%

                                                            40.00%

                                                            20.00%

Abundant Clustered Numerous Scattered Several   Solitary     0.00%




                                                                     % Poisonous   % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
                         300.00%


                         250.00%
                                                          several
Poisonous/Edible Ratio




                         200.00%


                         150.00%


                         100.00%


                          50.00%           solitary
                                                                        scattered
                                                                                           clustered
                           0.00%                                                    numerous         abundant
                                   0   1              2             3          4          5        6       7

                         -50.00%
                                                             Mushroom Density
Conclusions
 If   it stinks don’t eat it, 98.52% accuracy

 Ifit doesn’t stink and it’s spore color is not
  green then you have a 99.41% chance of
  survival

 Odor  and spore color may be the best
  attributes statistically but not in the field
Future Work
   Use more easily identified attributes to classify
    mushrooms to produce a method of easier
    visual classification

   Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk

   Compare the two methods

Contenu connexe

En vedette

EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
SHPINE TECHNOLOGIES
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty Workshop
Cathy Burwell
 

En vedette (17)

Group7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_FinalGroup7_Datamining_Project_Report_Final
Group7_Datamining_Project_Report_Final
 
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom DatasetSupport Vector Machine(SVM) with Iris and Mushroom Dataset
Support Vector Machine(SVM) with Iris and Mushroom Dataset
 
Scopus Overview
Scopus OverviewScopus Overview
Scopus Overview
 
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
 
Plagiarism for Faculty Workshop
Plagiarism for Faculty WorkshopPlagiarism for Faculty Workshop
Plagiarism for Faculty Workshop
 
ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014ANDROID IEEE PROJECT TITLES 2014
ANDROID IEEE PROJECT TITLES 2014
 
Why publish in an international journal?
Why publish in an international journal?Why publish in an international journal?
Why publish in an international journal?
 
Embedded project titles1:2015-2016
Embedded project titles1:2015-2016Embedded project titles1:2015-2016
Embedded project titles1:2015-2016
 
PROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIESPROJECTS FROM SHPINE TECHNOLOGIES
PROJECTS FROM SHPINE TECHNOLOGIES
 
Java course
Java course Java course
Java course
 
Matlab titles 2015 2016
Matlab titles 2015 2016Matlab titles 2015 2016
Matlab titles 2015 2016
 
Marshmallow
MarshmallowMarshmallow
Marshmallow
 
Android os by jje
Android os by jjeAndroid os by jje
Android os by jje
 
Android ieee project titles 2015 2016
Android ieee project titles 2015 2016Android ieee project titles 2015 2016
Android ieee project titles 2015 2016
 
Java titles 2015 2016
Java titles 2015 2016Java titles 2015 2016
Java titles 2015 2016
 
Dot Net Course Syllabus
Dot Net Course SyllabusDot Net Course Syllabus
Dot Net Course Syllabus
 
Introduction to iOS and Objective-C
Introduction to iOS and Objective-CIntroduction to iOS and Objective-C
Introduction to iOS and Objective-C
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Project 2 Data Mining Part 1

  • 1. Project II Data Mining a Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez
  • 2. The Mushroom Dataset Data Set Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
  • 3. Mushroom Dataset  22 Independent attributes  1 Class Attribute (Can you eat it?) Edible(4,208)51.8% Poisonous(3,916)48.2%
  • 4. Mushroom Dataset 22 Attributes Total 18 Intrinsically on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 5. Odor attribute, 1R Learner The Simplest Rule 98.52% Acc. A = almond N = none C = creosote P = pungent F = foul S = spicy L = anise Y = fishy M = musty a c f l m n p s y
  • 6. J48 Tree 100% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  • 7. Simplest rule-set (Benchmark) These are Poisonous 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white 4. May also be population=clustered and cap-color=white (100% accuracy)
  • 8. Habitat Insights Waste is safe but stay away from paths Woods Grasses Leaves Meadows Paths Urban Waste
  • 9. Population Insights Mushrooms travel safer in groups Abundant Clustered Numerous Scattered Several Solitary
  • 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  • 11. Poisonous/Edible Ratio vs. Mushroom Population Density 300.00% 250.00% several Poisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  • 12. Conclusions  If it stinks don’t eat it, 98.52% accuracy  Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival  Odor and spore color may be the best attributes statistically but not in the field
  • 13. Future Work  Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification  Eliminate nonvisual attributes Focus on visual-queue attributes, e.g. habitat, population, cap and stalk  Compare the two methods

Notes de l'éditeur

  1. Pistasvisuales