Decarbonising Buildings: Making a net-zero built environment a reality
Weka
1. Data Mining Using WEKA
Submitted to
Prof. Prithwis Mukerjee
Submitted By
Shikha Jayaswal
19th April, 2012
2. Table of Contents
Objective ................................................................................................................................................4
WEKA......................................................................................................................................................4
Running WEKA....................................................................................................................................4
Loading Datasets:...................................................................................................................................5
Linear Regression...................................................................................................................................7
Model.................................................................................................................................................7
Interpreting the Output......................................................................................................................7
Clustering................................................................................................................................................8
Model.................................................................................................................................................8
Interpreting the Output......................................................................................................................9
3. List of Figures:
Figure 1: Weka GUI Chooser...................................................................................................................4
Figure 2: Weka Explorer.........................................................................................................................5
Figure 3: Load Dataset............................................................................................................................6
Figure 4: Linear Regression.....................................................................................................................7
4. Objective
Exhibit the use of WEKA in performing the following data mining tasks:
• Linear Regression.
• Clustering
WEKA
Weka is a data mining tool developed at the University of Waikato. It uses GNU general public
licenses and is freely available. It is implemented in the java programming language and has GUI for
loading data, running analysis and producing visualizations.
The software could be downloaded from: http://www.cs.waikato.ac.nz/~ml/weka/
The version being used in the current analysis is 3.6.6.
Running WEKA
The following Weka GUI Chooser pops up on running weka:
Figure 1: Weka GUI Chooser
The Explorer button leads to the Weka Explorer window through which data could be loaded and be
used further for analysis.
5. Figure 2: Weka Explorer
Loading Datasets:
The file types supported are:
• Arff data files
• C4.5 data files
• Csv data files
• Libsvm data file
• Svm ligt data files
• Binary serialized data files
• Xrff data files
The data file being used for the study is:
6. Click “Open file..” >> select the file to be loaded and open it.
Figure 3: Load Dataset
7. Linear Regression
Model
Steps for creating the regression model:
1. Click on the Classify tab.
2. Click on the Choose button, in the window that opens up expand classifiers and then
functions, select LinearRegression.
3. Click on the LinearRegression text area, one could see GenericObjectEditor pop-up, in the
dropdown attributeSelectionMethod select No Attribute Selection, Click on OK.
4. Check Use Training Set to use the loaded dataset.
5. In the dropdown select Price/Unit as the dependent variable and click on the Start button.
Figure 4: Linear Regression
Interpreting the Output
Price/Unit = -0.0012 * BTU/Hr + 0.5806 * Weight lbs + 3.7411 * EER + 0 * Unit volume
-1.2524 * Region -2.1025 * Type + 24.8058
8. Clustering
Model
Steps for creating the clustering model:
1. Click on the Cluster tab.
2. Click on the Choose button, in the window that opens up expand clusterers, select EM.
3. Click on the EM text area, one could see GenericObjectEditor pop-up, Fill in the cluster
attributes, Click on OK.
a. -V Verbose.
b. -N The number of clusters to generate. If omitted, EM will use cross validation to
select the number of clusters automatically.
c. -I Terminate after this many iterations if EM has not converged.
d. -S Specify random number seed.
e. -M Set the minimum allowable standard deviation for normal density calculation.
4. Check Use Training Set to use the loaded dataset and click on the Start button.
9. Interpreting the Output
The Clustered Instances:
Cluster Instances
0 7(16%)
1 14(31%)
2 10(22%)
3 3(%)
4 11(24%)
The attributes of the clusters are:
Cluster 0 1 2 3 4
Attribute 0.16 0.3 0.2 0.07 0.27
mean 34.1022 32.5883 39.1963 38.0867 30.9768
Price/Unit std. dev. 4.1176 1.2413 2.2264 1.0193 2.8369
mean 912.8122 499.9553 496.4343 856.6667 347.0964
BTU/Hr std. dev. 105.4301 159.6201 178.5667 57.9272 140.3392
mean 10.4966 5.6066 5.6444 9.5967 3.9301
Weight lbs. std. dev. 1.3785 1.848 2.0181 0.7312 1.559
mean 3.3643 3.9673 4.9873 4.8533 4.4754
EER std. dev 0.2773 0.3885 0.3347 0.1586 0.3313
mean 180985.9 129223.9 71417.94 74000 92473.04
Unit Volume std. dev 239037.4 135545.2 45108.85 44639.3 85150.53
mean 3 3.1226 4 5 4.8882
Region std. dev 0.8848 0.4794 0 0.8848 0.365
mean 1.1427 2 2 1.3333 2
Type std. dev 0.3497 0.3866 0.3866 0.4714 0.3866