SlideShare une entreprise Scribd logo
1  sur  9
Data Mining Using WEKA



         Submitted to
    Prof. Prithwis Mukerjee


        Submitted By
       Shikha Jayaswal




        19th April, 2012
Table of Contents

Objective ................................................................................................................................................4

WEKA......................................................................................................................................................4

   Running WEKA....................................................................................................................................4

Loading Datasets:...................................................................................................................................5

Linear Regression...................................................................................................................................7

   Model.................................................................................................................................................7

   Interpreting the Output......................................................................................................................7

Clustering................................................................................................................................................8

   Model.................................................................................................................................................8

   Interpreting the Output......................................................................................................................9
List of Figures:

Figure 1: Weka GUI Chooser...................................................................................................................4

Figure 2: Weka Explorer.........................................................................................................................5

Figure 3: Load Dataset............................................................................................................................6

Figure 4: Linear Regression.....................................................................................................................7
Objective

Exhibit the use of WEKA in performing the following data mining tasks:

    •   Linear Regression.
    •   Clustering



WEKA

Weka is a data mining tool developed at the University of Waikato. It uses GNU general public
licenses and is freely available. It is implemented in the java programming language and has GUI for
loading data, running analysis and producing visualizations.

The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/
The version being used in the current analysis is 3.6.6.


Running WEKA


The following Weka GUI Chooser pops up on running weka:




Figure 1: Weka GUI Chooser




The Explorer button leads to the Weka Explorer window through which data could be loaded and be
used further for analysis.
Figure 2: Weka Explorer




Loading Datasets:

The file types supported are:

    •   Arff data files
    •   C4.5 data files
    •   Csv data files
    •   Libsvm data file
    •   Svm ligt data files
    •   Binary serialized data files
    •   Xrff data files


The data file being used for the study is:
Click “Open file..” >> select the file to be loaded and open it.




Figure 3: Load Dataset
Linear Regression
Model
Steps for creating the regression model:

   1. Click on the Classify tab.
   2. Click on the Choose button, in the window that opens up expand classifiers and then
      functions, select LinearRegression.
   3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the
      dropdown attributeSelectionMethod select No Attribute Selection, Click on OK.
   4. Check Use Training Set to use the loaded dataset.
   5. In the dropdown select Price/Unit as the dependent variable and click on the Start button.




   Figure 4: Linear Regression




Interpreting the Output


Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume
             -1.2524 * Region -2.1025 * Type + 24.8058
Clustering
Model
Steps for creating the clustering model:

    1. Click on the Cluster tab.
    2. Click on the Choose button, in the window that opens up expand clusterers, select EM.
    3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster
       attributes, Click on OK.
            a. -V Verbose.
            b. -N The number of clusters to generate. If omitted, EM will use cross validation to
                select the number of clusters automatically.
            c. -I Terminate after this many iterations if EM has not converged.
            d. -S Specify random number seed.
            e. -M Set the minimum allowable standard deviation for normal density calculation.
    4. Check Use Training Set to use the loaded dataset and click on the Start button.
Interpreting the Output


The Clustered Instances:

   Cluster      Instances
      0           7(16%)
      1          14(31%)
      2          10(22%)
      3            3(%)
      4          11(24%)


The attributes of the clusters are:

 Cluster                                     0           1           2           3          4
 Attribute                                0.16         0.3         0.2        0.07       0.27
                      mean             34.1022    32.5883     39.1963     38.0867     30.9768
 Price/Unit           std. dev.         4.1176     1.2413      2.2264      1.0193      2.8369
                      mean            912.8122   499.9553    496.4343    856.6667    347.0964
 BTU/Hr               std. dev.       105.4301   159.6201    178.5667     57.9272    140.3392
                      mean             10.4966     5.6066      5.6444      9.5967      3.9301
 Weight lbs.          std. dev.         1.3785      1.848      2.0181      0.7312       1.559
                     mean               3.3643     3.9673      4.9873      4.8533      4.4754
 EER                 std. dev           0.2773     0.3885      0.3347      0.1586      0.3313
                     mean             180985.9   129223.9    71417.94       74000    92473.04
 Unit Volume         std. dev         239037.4   135545.2    45108.85     44639.3    85150.53
                     mean                    3     3.1226            4           5     4.8882
 Region              std. dev           0.8848     0.4794            0     0.8848       0.365
                     mean               1.1427           2           2     1.3333           2
 Type                std. dev           0.3497     0.3866      0.3866      0.4714      0.3866

Contenu connexe

En vedette

En vedette (11)

Fighting spam using social gate keepers
Fighting spam using social gate keepersFighting spam using social gate keepers
Fighting spam using social gate keepers
 
Amazon mp
Amazon mpAmazon mp
Amazon mp
 
Real time classification of malicious urls.pptx 2
Real time classification of malicious urls.pptx 2Real time classification of malicious urls.pptx 2
Real time classification of malicious urls.pptx 2
 
Twitter r t under crisis
Twitter r t under crisisTwitter r t under crisis
Twitter r t under crisis
 
Weka
WekaWeka
Weka
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Weka
WekaWeka
Weka
 
Weka presentation cmt111
Weka presentation cmt111Weka presentation cmt111
Weka presentation cmt111
 
Social influence and political mobilization
Social influence and political mobilizationSocial influence and political mobilization
Social influence and political mobilization
 
Predictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That MattersPredictive Analytics: It's The Intervention That Matters
Predictive Analytics: It's The Intervention That Matters
 
An Introduction To Weka
An Introduction To WekaAn Introduction To Weka
An Introduction To Weka
 

Similaire à Weka

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012RossBettinger
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Luis Beltran
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxdickonsondorris
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthiHoopeer Hoopeer
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Pieter Van Zyl
 
AWS Cost Cheat Sheet
AWS Cost Cheat SheetAWS Cost Cheat Sheet
AWS Cost Cheat SheetAkash Agrawal
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsKelly Lipiec
 
Financial Data Mining Talk
Financial Data Mining TalkFinancial Data Mining Talk
Financial Data Mining TalkMike Bowles
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2Jon Zickermann
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign DetectionCraig Ferguson
 
Come for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesCome for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesGábor Hojtsy
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on SteroidsAdam Blevins
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
 

Similaire à Weka (20)

Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012Sas rule based codebook generation for exploratory data analysis - wuss 2012
Sas rule based codebook generation for exploratory data analysis - wuss 2012
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
 
thesis
thesisthesis
thesis
 
ContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docxContentsPreface vii1 Introduction 11.1 What .docx
ContentsPreface vii1 Introduction 11.1 What .docx
 
2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi2019 imta bouklihacene-ghouthi
2019 imta bouklihacene-ghouthi
 
Report
ReportReport
Report
 
edc_adaptivity
edc_adaptivityedc_adaptivity
edc_adaptivity
 
document
documentdocument
document
 
Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010Dissertation_of_Pieter_van_Zyl_2_March_2010
Dissertation_of_Pieter_van_Zyl_2_March_2010
 
Thesis
ThesisThesis
Thesis
 
AWS Cost Cheat Sheet
AWS Cost Cheat SheetAWS Cost Cheat Sheet
AWS Cost Cheat Sheet
 
data structures
data structuresdata structures
data structures
 
An Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing UnitsAn Optical Character Recognition Engine For Graphical Processing Units
An Optical Character Recognition Engine For Graphical Processing Units
 
Financial Data Mining Talk
Financial Data Mining TalkFinancial Data Mining Talk
Financial Data Mining Talk
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign Detection
 
Come for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolvesCome for the software, stay for the community - How Drupal improves and evolves
Come for the software, stay for the community - How Drupal improves and evolves
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Neural Networks on Steroids
Neural Networks on SteroidsNeural Networks on Steroids
Neural Networks on Steroids
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 

Dernier

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Dernier (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Weka

  • 1. Data Mining Using WEKA Submitted to Prof. Prithwis Mukerjee Submitted By Shikha Jayaswal 19th April, 2012
  • 2. Table of Contents Objective ................................................................................................................................................4 WEKA......................................................................................................................................................4 Running WEKA....................................................................................................................................4 Loading Datasets:...................................................................................................................................5 Linear Regression...................................................................................................................................7 Model.................................................................................................................................................7 Interpreting the Output......................................................................................................................7 Clustering................................................................................................................................................8 Model.................................................................................................................................................8 Interpreting the Output......................................................................................................................9
  • 3. List of Figures: Figure 1: Weka GUI Chooser...................................................................................................................4 Figure 2: Weka Explorer.........................................................................................................................5 Figure 3: Load Dataset............................................................................................................................6 Figure 4: Linear Regression.....................................................................................................................7
  • 4. Objective Exhibit the use of WEKA in performing the following data mining tasks: • Linear Regression. • Clustering WEKA Weka is a data mining tool developed at the University of Waikato. It uses GNU general public licenses and is freely available. It is implemented in the java programming language and has GUI for loading data, running analysis and producing visualizations. The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/ The version being used in the current analysis is 3.6.6. Running WEKA The following Weka GUI Chooser pops up on running weka: Figure 1: Weka GUI Chooser The Explorer button leads to the Weka Explorer window through which data could be loaded and be used further for analysis.
  • 5. Figure 2: Weka Explorer Loading Datasets: The file types supported are: • Arff data files • C4.5 data files • Csv data files • Libsvm data file • Svm ligt data files • Binary serialized data files • Xrff data files The data file being used for the study is:
  • 6. Click “Open file..” >> select the file to be loaded and open it. Figure 3: Load Dataset
  • 7. Linear Regression Model Steps for creating the regression model: 1. Click on the Classify tab. 2. Click on the Choose button, in the window that opens up expand classifiers and then functions, select LinearRegression. 3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the dropdown attributeSelectionMethod select No Attribute Selection, Click on OK. 4. Check Use Training Set to use the loaded dataset. 5. In the dropdown select Price/Unit as the dependent variable and click on the Start button. Figure 4: Linear Regression Interpreting the Output Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume -1.2524 * Region -2.1025 * Type + 24.8058
  • 8. Clustering Model Steps for creating the clustering model: 1. Click on the Cluster tab. 2. Click on the Choose button, in the window that opens up expand clusterers, select EM. 3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster attributes, Click on OK. a. -V Verbose. b. -N The number of clusters to generate. If omitted, EM will use cross validation to select the number of clusters automatically. c. -I Terminate after this many iterations if EM has not converged. d. -S Specify random number seed. e. -M Set the minimum allowable standard deviation for normal density calculation. 4. Check Use Training Set to use the loaded dataset and click on the Start button.
  • 9. Interpreting the Output The Clustered Instances: Cluster Instances 0 7(16%) 1 14(31%) 2 10(22%) 3 3(%) 4 11(24%) The attributes of the clusters are: Cluster 0 1 2 3 4 Attribute 0.16 0.3 0.2 0.07 0.27 mean 34.1022 32.5883 39.1963 38.0867 30.9768 Price/Unit std. dev. 4.1176 1.2413 2.2264 1.0193 2.8369 mean 912.8122 499.9553 496.4343 856.6667 347.0964 BTU/Hr std. dev. 105.4301 159.6201 178.5667 57.9272 140.3392 mean 10.4966 5.6066 5.6444 9.5967 3.9301 Weight lbs. std. dev. 1.3785 1.848 2.0181 0.7312 1.559 mean 3.3643 3.9673 4.9873 4.8533 4.4754 EER std. dev 0.2773 0.3885 0.3347 0.1586 0.3313 mean 180985.9 129223.9 71417.94 74000 92473.04 Unit Volume std. dev 239037.4 135545.2 45108.85 44639.3 85150.53 mean 3 3.1226 4 5 4.8882 Region std. dev 0.8848 0.4794 0 0.8848 0.365 mean 1.1427 2 2 1.3333 2 Type std. dev 0.3497 0.3866 0.3866 0.4714 0.3866