SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
VGSOM




WEKA – Data Mining
   Techniques
  Clustering and Regression




                 BY
         M.P.Vijaya Prabhu
           10BM60097
Contents
1.     INTRODUCTION ............................................................................................................................... 3

2.     CLUSTERING .................................................................................................................................... 4

     2.1      Data Visualization..................................................................................................................... 8

3.     Regression Analysis........................................................................................................................ 10

     3.1      Pricing the house ................................................................................................................... 10

4.     References..................................................................................................................................... 13
WEKA – DATA MINING TECHNIQUES
    1. INTRODUCTION
        “Data Mining Software in Java”. Weka is the acronym of Waikato Environment for Knowledge
Analysis is a collection of state-of-the-art machine learning algorithms and data preprocessing tools
written in Java, developed at the University of Waikato, New Zealand. It is free software that runs on
almost any platform and is available under the GNU General Public License.

        Weka is the next generation Data Mining Tool to complex analysis more interactively and can
visualize it more effectively.

WEKA GUI appears like this




Advantages of using WEKA

    1) Built in Advanced algorithm
    2) Effective Visualization of results
    3) Easy to use GUI
Let us demonstrate the use of WEKA using 2 examples each on CLUSTERING (Kmeans) and
        Regression.


    2. CLUSTERING
Data is a sample bank data taken from an online source.It contains the following attributes
        1) age numeric
        2) {FEMALE,MALE}
        3) region {INNER_CITY,TOWN,RURAL,SUBURBAN}
        4) income numeric
        5) married {NO,YES}
        6) children {0,1,2,3}
        7) car {NO,YES}
        8) save_act {NO,YES}
        9) current_act {NO,YES}
        10) mortgage {NO,YES}
        11) pep {YES,NO}


        Based on these data we need to CLUSTER the user groups into 6 and have to find out the
        characteristics of each group.

The sample data contains 600 instances. The objective is to cluster based on K-Means algorithm.
Once the preprocessing of the data is done, we can start with clustering the data.


First, the data is loaded into WEKA and preprocessing can be done as shown below.
WEKA SimpleKMeans algorithm automatically handles a mixture of categorical and

numerical attributes. While doing distance computations like in our case, the built in algorithm
will automatically normalizes numerical attributes. Euclidean distance is general measure of
distance between Euclidean and clusters.




        After selecting k-Means we can select advance settings in the k-means algorithm. We
have given the CLUSTERs as 6 from 2 ,to get 6 different clusters from the given data.
After the required details are given “Use Training Set” is checked. Then we can click “Start”




The result is available as given below.
================================================================================================
OUTPUT :
=== Run information ===

Scheme:       weka.clusterers.SimpleKMeans -N 6 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10
Relation:     bank-data
Instances:    600
Attributes:   12
        id
age
       sex
       region
       income
       married
       children
       car
       save_act
       current_act
       mortgage
       pep
Test mode: evaluate on training data


=== Clustering model (full training set) ===


kMeans
======

Number of iterations: 18
Within cluster sum of squared errors: 1955.4146634784236
Missing values globally replaced with mean/mode

Cluster centroids:
                Cluster#
Attribute Full Data         0    1      2     3      4      5
           (600)     (74) (164)    (71)    (58)    (99) (134)
==========================================================================================
id        ID12101 ID12107 ID12103 ID12101 ID12104 ID12102 ID12108
age          42.395 42.9324 43.7744 39.0282 37.3103 38.404 47.3433
sex         FEMALE FEMALE FEMALE FEMALE FEMALE                    MALE MALE
region     INNER_CITY RURAL INNER_CITY INNER_CITY           TOWN INNER_CITY TOWN
income      27524.0312 28838.7605 28586.4063 20463.1273 20600.8528 25720.037 33568.3929
married         YES      NO    YES     YES     YES     YES     NO
children      1.0117 1.973 0.628 0.6901 1.6207 0.899 0.9403
car           NO       NO     NO     NO      NO      YES     YES
save_act         YES     YES   YES      NO      NO      NO      YES
current_act       YES     YES   YES     YES     YES     YES     YES
mortgage           NO      NO    NO       NO     NO      YES      NO
pep            NO       NO    NO     YES      NO     YES      YES




Time taken to build model (full training data) : 0.16 seconds

=== Model and evaluation on training set ===

Clustered Instances

0   74 ( 12%)
1 164 ( 27%)
2   71 ( 12%)
3   58 ( 10%)
4   99 ( 17%)
5 134 ( 22%)
================================================================================================
The result window shows the centroid of each cluster as well as statistics on the number and
      percentage of instances assigned to different clusters.
                0   74 ( 12%)
                1   164 ( 27%)
                2   71 ( 12%)
                3   58 ( 10%)
                4   99 ( 17%)
                5   134 ( 22%)


      The put put of this clustering can be found in the form of cluster centroid



  Cluster            0               1           2               3            4               5            6
   age            42.395          42.9324     43.7744        39.0282       37.3103         38.404       47.3433
   sex           FEMALE           FEMALE      FEMALE         FEMALE        FEMALE          MALE          MALE
                INNER_CIT                    INNER_CIT      INNER_CIT                    INNER_CIT
  region
                     Y             RURAL         Y               Y          TOWN              Y         TOWN
                27524.031        28838.760   28586.406      20463.127     20600.852                   33568.392
  income
                     2               5           3               3            8          25720.037        9
 married            YES             NO          YES             YES          YES            YES          NO
 children         1.0117           1.973       0.628          0.6901       1.6207          0.899       0.9403
    car             NO              NO          NO              NO           NO             YES          YES
 save_act           YES             YES         YES             NO           NO             NO           YES
current_act         YES             YES         YES             YES          YES            YES          YES
mortgage            NO              NO          NO              NO           NO             YES          NO
   pep              NO              NO          NO              YES          NO             YES          YES


      For example, the centroid for cluster 0 shows that this is a segment of cases representing middle aged
      (approx. 42) females living in inner city with an average income of approx. $27,500, who are married
      with one child, etc. Furthermore, this group has on average said YES to the NO product.


              2.1 Data Visualization

      The result can be viewed more intuitively by the advanced VISUALIZATION built in WEKA.

                The visualization of the distribution of male and female in each cluster can be found by using the
      following methods.

                Step 1 : Right click on the output and select “Visualise Cluster alignment”
Step 2 : Select the different cluster as the X axis.

Step 3 : SelectInstance_Nbr as Y Axis

Step 4 : Select “ Sex “ as colour.It means it will differentiate sex based on colour.

This will result in a visualization of the distribution of males and females in each cluster.
3. Regression Analysis
  Regression can be done effectively with more options via WEKA software.Lets explain it using a
  simple “LinearRegression”

3.1 Pricing the house

   Data is taken from an online source .The selling price of the house needs to be determined
  based on the data given. The data contains the following attributes.


  1) houseSize NUMERIC
  2) lotSize NUMERIC
  3) bedrooms NUMERIC
  4) granite NUMERIC
  5) bathroom NUMERIC
  6) sellingPrice NUMERIC


  So, based on the size of the house, Lot size ,number of bedrooms it has ,whether it is furnished
  with Granite, number of bathroom ,we need to predict the DEPENDANT VARIABLE ,i.e. the
  SELLING PRICE.


  First, the data is loaded into WEKA and necessary preprocess is done. Since, our data is already
  processed .We proceed to selecting the type of REGRESSION
In the picture given above select the “Linear Regression” tab. Then Select “Use Training Set” in
the Test Options.




There are three other choices available while doing simple Linear Regression they are
       Supplied test set: Supply test data to do model
    Cross-validation : which lets WEKA build a model based on subsets of the supplied data
         and then average them out to create a final model
        Percentage split: where WEKA takes a percentile subset to build a final model.


Here the column “Selling Price” is chosen. This means with the available data we are going to
predict the DEPENDANT VARIABLE (Selling Price).


Then click on the “Start” button to build a model using WEKA.
OUTPUT:
================================================================================================
=== Run information ===
Scheme:      weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8
Relation: house
Instances: 700
Attributes: 6
        houseSize
        lotSize
        bedrooms
        granite
        bathroom
        sellingPrice
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Linear Regression Model
sellingPrice =
   22.6582 * houseSize +
    9.1242 * lotSize +
  42145.0767 * bedrooms +
  42562.0901 * bathroom +
 -20981.3142

Time taken to build model: 0.04 seconds

=== Evaluation on training set ===
=== Summary ===

Correlation coefficient      0.9945
Mean absolute error         4790.821
Root mean squared error       4245.4125
Relative absolute error      11.9082 %
Root relative squared error    11.21 %
Total Number of Instances       700
================================================================================================


The output predicts that the Selling price will be
sellingPrice= (22.6582*houseSize) + (9.1242 * lotSize) + (42145.0767 * bedrooms) +
  (42562.0901 * bathroom) -20981.3142.


  If we want to determine the “selling price” of the house based on given data just “Plug in” the
  values and find it easily.


  The output predicts that the “Granite” doesn’t matter much regarding the SELLING PRICE of the
  house.




4. References

  http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm
  www.cs.waikato.ac.nz/ml/weka/
  http://www.laits.utexas.edu/~norman/BUS.FOR/course.mat/Alex/
  http://maya.cs.depaul.edu/classes/ect584/weka/k-means.html
  http://www.cs.utexas.edu/users/ml/tutorials/Weka-tut/

Contenu connexe

Tendances

Cadbury Dairy Milk
Cadbury Dairy Milk Cadbury Dairy Milk
Cadbury Dairy Milk falak nawaz
 
The rise and fall of maggie
The rise and fall of maggieThe rise and fall of maggie
The rise and fall of maggiesahanas05
 
Marketing strategies of Cadbury
Marketing strategies of CadburyMarketing strategies of Cadbury
Marketing strategies of CadburyHarshali Kotekar
 
Cadbury Dairy Milk - Brand Observation Portfolio
Cadbury Dairy Milk - Brand Observation PortfolioCadbury Dairy Milk - Brand Observation Portfolio
Cadbury Dairy Milk - Brand Observation PortfolioSandeep Pahwa
 
distribution channel of dabur
distribution channel of daburdistribution channel of dabur
distribution channel of daburMj Payal
 
A Chocolate Industry Project Report
A Chocolate Industry Project ReportA Chocolate Industry Project Report
A Chocolate Industry Project ReportBhavik Parmar
 
Promotion strategy of cadbury
Promotion strategy of cadburyPromotion strategy of cadbury
Promotion strategy of cadburySameer Mathur
 
Market segmentation amul
Market segmentation amulMarket segmentation amul
Market segmentation amulSejal Patil
 
Product profiling of marico
Product profiling of maricoProduct profiling of marico
Product profiling of maricoAAKASHSHARMA280
 
Bingo Product- Analysis
Bingo Product- AnalysisBingo Product- Analysis
Bingo Product- AnalysisRinshi Singh
 
Summer Internship Project PPT
Summer Internship Project PPTSummer Internship Project PPT
Summer Internship Project PPTArun Gupta
 
Women Horlicks - Brand Extension Analysis
Women Horlicks - Brand Extension AnalysisWomen Horlicks - Brand Extension Analysis
Women Horlicks - Brand Extension AnalysisSameer Mathur
 
Lifebuoy Case Study
Lifebuoy Case StudyLifebuoy Case Study
Lifebuoy Case StudyMita Hadi
 

Tendances (20)

Cadbury Dairy Milk
Cadbury Dairy Milk Cadbury Dairy Milk
Cadbury Dairy Milk
 
The rise and fall of maggie
The rise and fall of maggieThe rise and fall of maggie
The rise and fall of maggie
 
Sales and distribution management at coca cola
Sales and distribution management at coca colaSales and distribution management at coca cola
Sales and distribution management at coca cola
 
Marketing strategies of Cadbury
Marketing strategies of CadburyMarketing strategies of Cadbury
Marketing strategies of Cadbury
 
Cadbury vs Nestle
Cadbury vs NestleCadbury vs Nestle
Cadbury vs Nestle
 
Cadbury Dairy Milk - Brand Observation Portfolio
Cadbury Dairy Milk - Brand Observation PortfolioCadbury Dairy Milk - Brand Observation Portfolio
Cadbury Dairy Milk - Brand Observation Portfolio
 
Cadbury ppt
Cadbury pptCadbury ppt
Cadbury ppt
 
distribution channel of dabur
distribution channel of daburdistribution channel of dabur
distribution channel of dabur
 
A Chocolate Industry Project Report
A Chocolate Industry Project ReportA Chocolate Industry Project Report
A Chocolate Industry Project Report
 
Promotion strategy of cadbury
Promotion strategy of cadburyPromotion strategy of cadbury
Promotion strategy of cadbury
 
Market segmentation amul
Market segmentation amulMarket segmentation amul
Market segmentation amul
 
Product profiling of marico
Product profiling of maricoProduct profiling of marico
Product profiling of marico
 
Cadbury-SMBA 21
Cadbury-SMBA 21Cadbury-SMBA 21
Cadbury-SMBA 21
 
Nestle ppt
Nestle pptNestle ppt
Nestle ppt
 
Cadbury- Marketing strategies
 Cadbury- Marketing strategies Cadbury- Marketing strategies
Cadbury- Marketing strategies
 
Bingo Product- Analysis
Bingo Product- AnalysisBingo Product- Analysis
Bingo Product- Analysis
 
Dabur case study
Dabur case studyDabur case study
Dabur case study
 
Summer Internship Project PPT
Summer Internship Project PPTSummer Internship Project PPT
Summer Internship Project PPT
 
Women Horlicks - Brand Extension Analysis
Women Horlicks - Brand Extension AnalysisWomen Horlicks - Brand Extension Analysis
Women Horlicks - Brand Extension Analysis
 
Lifebuoy Case Study
Lifebuoy Case StudyLifebuoy Case Study
Lifebuoy Case Study
 

En vedette

Linear Regression Parameters
Linear Regression ParametersLinear Regression Parameters
Linear Regression Parameterscamposer
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKAbutest
 
DATA MINING WITH WEKA
DATA MINING WITH WEKADATA MINING WITH WEKA
DATA MINING WITH WEKAShubham Gupta
 
Drug glossaries
Drug glossariesDrug glossaries
Drug glossariesITgal
 
08 批次處理大量照片
08 批次處理大量照片08 批次處理大量照片
08 批次處理大量照片欣彥 郭
 
Rowin Petersma ’Projects 2011-1’
Rowin Petersma ’Projects 2011-1’Rowin Petersma ’Projects 2011-1’
Rowin Petersma ’Projects 2011-1’Rowin Petersma
 
Rowin Petersma \'Projects 2011-2\'
Rowin Petersma \'Projects 2011-2\'Rowin Petersma \'Projects 2011-2\'
Rowin Petersma \'Projects 2011-2\'Rowin Petersma
 
The immune system and anxiety disorders
The immune system and anxiety disordersThe immune system and anxiety disorders
The immune system and anxiety disordersYasir Hameed
 
NO HORSE PLAY
NO HORSE PLAYNO HORSE PLAY
NO HORSE PLAYEEWPRRK8
 
Infusing social justice principles in the research process
Infusing social justice principles in the research processInfusing social justice principles in the research process
Infusing social justice principles in the research processruthcwhite
 
Ruth White Cv11.11.11
Ruth White Cv11.11.11Ruth White Cv11.11.11
Ruth White Cv11.11.11ruthcwhite
 
bureau rowin petersma 2015
bureau rowin petersma 2015bureau rowin petersma 2015
bureau rowin petersma 2015Rowin Petersma
 
Ruth C. White Resume
Ruth C. White ResumeRuth C. White Resume
Ruth C. White Resumeruthcwhite
 
The Reproductive System
The Reproductive SystemThe Reproductive System
The Reproductive Systembsullivan4
 
Intercalated BMedSc Psychological Medicine
Intercalated BMedSc Psychological MedicineIntercalated BMedSc Psychological Medicine
Intercalated BMedSc Psychological MedicineYasir Hameed
 

En vedette (20)

Linear Regression Parameters
Linear Regression ParametersLinear Regression Parameters
Linear Regression Parameters
 
Machine Learning with WEKA
Machine Learning with WEKAMachine Learning with WEKA
Machine Learning with WEKA
 
DATA MINING WITH WEKA
DATA MINING WITH WEKADATA MINING WITH WEKA
DATA MINING WITH WEKA
 
Drug glossaries
Drug glossariesDrug glossaries
Drug glossaries
 
Langzame Stad
Langzame StadLangzame Stad
Langzame Stad
 
The Eye
The EyeThe Eye
The Eye
 
08 批次處理大量照片
08 批次處理大量照片08 批次處理大量照片
08 批次處理大量照片
 
Rowin Petersma ’Projects 2011-1’
Rowin Petersma ’Projects 2011-1’Rowin Petersma ’Projects 2011-1’
Rowin Petersma ’Projects 2011-1’
 
Rowin Petersma \'Projects 2011-2\'
Rowin Petersma \'Projects 2011-2\'Rowin Petersma \'Projects 2011-2\'
Rowin Petersma \'Projects 2011-2\'
 
The immune system and anxiety disorders
The immune system and anxiety disordersThe immune system and anxiety disorders
The immune system and anxiety disorders
 
Ptc
PtcPtc
Ptc
 
NO HORSE PLAY
NO HORSE PLAYNO HORSE PLAY
NO HORSE PLAY
 
Ebook colombia travel
Ebook colombia travelEbook colombia travel
Ebook colombia travel
 
Angles complementaris
Angles complementarisAngles complementaris
Angles complementaris
 
Infusing social justice principles in the research process
Infusing social justice principles in the research processInfusing social justice principles in the research process
Infusing social justice principles in the research process
 
Ruth White Cv11.11.11
Ruth White Cv11.11.11Ruth White Cv11.11.11
Ruth White Cv11.11.11
 
bureau rowin petersma 2015
bureau rowin petersma 2015bureau rowin petersma 2015
bureau rowin petersma 2015
 
Ruth C. White Resume
Ruth C. White ResumeRuth C. White Resume
Ruth C. White Resume
 
The Reproductive System
The Reproductive SystemThe Reproductive System
The Reproductive System
 
Intercalated BMedSc Psychological Medicine
Intercalated BMedSc Psychological MedicineIntercalated BMedSc Psychological Medicine
Intercalated BMedSc Psychological Medicine
 

Similaire à Clustering and Regression using WEKA

face recognition using Principle Componet Analysis
face recognition using Principle Componet Analysisface recognition using Principle Componet Analysis
face recognition using Principle Componet AnalysisAbhilash Kotawar
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleFrances Coronel
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generationrsathishwaran
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Reportbutest
 
Using Open Source Tools for Machine Learning
Using Open Source Tools for Machine LearningUsing Open Source Tools for Machine Learning
Using Open Source Tools for Machine LearningAll Things Open
 
ARitificial Intelligence - Project - Data Classification
ARitificial Intelligence - Project - Data ClassificationARitificial Intelligence - Project - Data Classification
ARitificial Intelligence - Project - Data Classificationmayank0318
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Luis Beltran
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMaris R
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin NUI Galway
 
DATA MINING - EVALUATING CLUSTERING ALGORITHM
DATA MINING - EVALUATING CLUSTERING ALGORITHMDATA MINING - EVALUATING CLUSTERING ALGORITHM
DATA MINING - EVALUATING CLUSTERING ALGORITHMTochukwu Udeh
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Treessathish sak
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 

Similaire à Clustering and Regression using WEKA (20)

face recognition using Principle Componet Analysis
face recognition using Principle Componet Analysisface recognition using Principle Componet Analysis
face recognition using Principle Componet Analysis
 
AP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One SampleAP Statistics - Confidence Intervals with Means - One Sample
AP Statistics - Confidence Intervals with Means - One Sample
 
Weka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule GenerationWeka project - Classification & Association Rule Generation
Weka project - Classification & Association Rule Generation
 
Final Project Report
Final Project ReportFinal Project Report
Final Project Report
 
MNIST 10-class Classifiers
MNIST 10-class ClassifiersMNIST 10-class Classifiers
MNIST 10-class Classifiers
 
Using Open Source Tools for Machine Learning
Using Open Source Tools for Machine LearningUsing Open Source Tools for Machine Learning
Using Open Source Tools for Machine Learning
 
ARitificial Intelligence - Project - Data Classification
ARitificial Intelligence - Project - Data ClassificationARitificial Intelligence - Project - Data Classification
ARitificial Intelligence - Project - Data Classification
 
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
Cloud Lunch and Learn ML.NET MACHINE LEARNING (AND DEEP LEARNING) FOR THE CSh...
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin 2013.11.14 Big Data Workshop Bruno Voisin
2013.11.14 Big Data Workshop Bruno Voisin
 
DATA MINING - EVALUATING CLUSTERING ALGORITHM
DATA MINING - EVALUATING CLUSTERING ALGORITHMDATA MINING - EVALUATING CLUSTERING ALGORITHM
DATA MINING - EVALUATING CLUSTERING ALGORITHM
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
 
07 learning
07 learning07 learning
07 learning
 
Notes Chapter 4.pptx
Notes Chapter 4.pptxNotes Chapter 4.pptx
Notes Chapter 4.pptx
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 

Plus de Vijaya Prabhu

Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspectiveVijaya Prabhu
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspectiveVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotialVijaya Prabhu
 

Plus de Vijaya Prabhu (9)

Bose corporation
Bose corporationBose corporation
Bose corporation
 
Bose corporation
Bose corporationBose corporation
Bose corporation
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspective
 
Google refine from a business perspective
Google refine   from a business perspectiveGoogle refine   from a business perspective
Google refine from a business perspective
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 
Google refine tutotial
Google refine tutotialGoogle refine tutotial
Google refine tutotial
 

Dernier

Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon investment
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPanhandleOilandGas
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptxnandhinijagan9867
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperityhemanthkumar470700
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Sheetaleventcompany
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Sheetaleventcompany
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizharallensay1
 
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...lizamodels9
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876dlhescort
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Anamikakaur10
 
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLWhitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 MonthsIndeedSEO
 

Dernier (20)

Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
PHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation FinalPHX May 2024 Corporate Presentation Final
PHX May 2024 Corporate Presentation Final
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
 
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLWhitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
Whitefield CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 

Clustering and Regression using WEKA

  • 1. VGSOM WEKA – Data Mining Techniques Clustering and Regression BY M.P.Vijaya Prabhu 10BM60097
  • 2. Contents 1. INTRODUCTION ............................................................................................................................... 3 2. CLUSTERING .................................................................................................................................... 4 2.1 Data Visualization..................................................................................................................... 8 3. Regression Analysis........................................................................................................................ 10 3.1 Pricing the house ................................................................................................................... 10 4. References..................................................................................................................................... 13
  • 3. WEKA – DATA MINING TECHNIQUES 1. INTRODUCTION “Data Mining Software in Java”. Weka is the acronym of Waikato Environment for Knowledge Analysis is a collection of state-of-the-art machine learning algorithms and data preprocessing tools written in Java, developed at the University of Waikato, New Zealand. It is free software that runs on almost any platform and is available under the GNU General Public License. Weka is the next generation Data Mining Tool to complex analysis more interactively and can visualize it more effectively. WEKA GUI appears like this Advantages of using WEKA 1) Built in Advanced algorithm 2) Effective Visualization of results 3) Easy to use GUI
  • 4. Let us demonstrate the use of WEKA using 2 examples each on CLUSTERING (Kmeans) and Regression. 2. CLUSTERING Data is a sample bank data taken from an online source.It contains the following attributes 1) age numeric 2) {FEMALE,MALE} 3) region {INNER_CITY,TOWN,RURAL,SUBURBAN} 4) income numeric 5) married {NO,YES} 6) children {0,1,2,3} 7) car {NO,YES} 8) save_act {NO,YES} 9) current_act {NO,YES} 10) mortgage {NO,YES} 11) pep {YES,NO} Based on these data we need to CLUSTER the user groups into 6 and have to find out the characteristics of each group. The sample data contains 600 instances. The objective is to cluster based on K-Means algorithm. Once the preprocessing of the data is done, we can start with clustering the data. First, the data is loaded into WEKA and preprocessing can be done as shown below.
  • 5. WEKA SimpleKMeans algorithm automatically handles a mixture of categorical and numerical attributes. While doing distance computations like in our case, the built in algorithm will automatically normalizes numerical attributes. Euclidean distance is general measure of distance between Euclidean and clusters. After selecting k-Means we can select advance settings in the k-means algorithm. We have given the CLUSTERs as 6 from 2 ,to get 6 different clusters from the given data.
  • 6. After the required details are given “Use Training Set” is checked. Then we can click “Start” The result is available as given below. ================================================================================================ OUTPUT : === Run information === Scheme: weka.clusterers.SimpleKMeans -N 6 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10 Relation: bank-data Instances: 600 Attributes: 12 id
  • 7. age sex region income married children car save_act current_act mortgage pep Test mode: evaluate on training data === Clustering model (full training set) === kMeans ====== Number of iterations: 18 Within cluster sum of squared errors: 1955.4146634784236 Missing values globally replaced with mean/mode Cluster centroids: Cluster# Attribute Full Data 0 1 2 3 4 5 (600) (74) (164) (71) (58) (99) (134) ========================================================================================== id ID12101 ID12107 ID12103 ID12101 ID12104 ID12102 ID12108 age 42.395 42.9324 43.7744 39.0282 37.3103 38.404 47.3433 sex FEMALE FEMALE FEMALE FEMALE FEMALE MALE MALE region INNER_CITY RURAL INNER_CITY INNER_CITY TOWN INNER_CITY TOWN income 27524.0312 28838.7605 28586.4063 20463.1273 20600.8528 25720.037 33568.3929 married YES NO YES YES YES YES NO children 1.0117 1.973 0.628 0.6901 1.6207 0.899 0.9403 car NO NO NO NO NO YES YES save_act YES YES YES NO NO NO YES current_act YES YES YES YES YES YES YES mortgage NO NO NO NO NO YES NO pep NO NO NO YES NO YES YES Time taken to build model (full training data) : 0.16 seconds === Model and evaluation on training set === Clustered Instances 0 74 ( 12%) 1 164 ( 27%) 2 71 ( 12%) 3 58 ( 10%) 4 99 ( 17%) 5 134 ( 22%) ================================================================================================
  • 8. The result window shows the centroid of each cluster as well as statistics on the number and percentage of instances assigned to different clusters. 0 74 ( 12%) 1 164 ( 27%) 2 71 ( 12%) 3 58 ( 10%) 4 99 ( 17%) 5 134 ( 22%) The put put of this clustering can be found in the form of cluster centroid Cluster 0 1 2 3 4 5 6 age 42.395 42.9324 43.7744 39.0282 37.3103 38.404 47.3433 sex FEMALE FEMALE FEMALE FEMALE FEMALE MALE MALE INNER_CIT INNER_CIT INNER_CIT INNER_CIT region Y RURAL Y Y TOWN Y TOWN 27524.031 28838.760 28586.406 20463.127 20600.852 33568.392 income 2 5 3 3 8 25720.037 9 married YES NO YES YES YES YES NO children 1.0117 1.973 0.628 0.6901 1.6207 0.899 0.9403 car NO NO NO NO NO YES YES save_act YES YES YES NO NO NO YES current_act YES YES YES YES YES YES YES mortgage NO NO NO NO NO YES NO pep NO NO NO YES NO YES YES For example, the centroid for cluster 0 shows that this is a segment of cases representing middle aged (approx. 42) females living in inner city with an average income of approx. $27,500, who are married with one child, etc. Furthermore, this group has on average said YES to the NO product. 2.1 Data Visualization The result can be viewed more intuitively by the advanced VISUALIZATION built in WEKA. The visualization of the distribution of male and female in each cluster can be found by using the following methods. Step 1 : Right click on the output and select “Visualise Cluster alignment”
  • 9. Step 2 : Select the different cluster as the X axis. Step 3 : SelectInstance_Nbr as Y Axis Step 4 : Select “ Sex “ as colour.It means it will differentiate sex based on colour. This will result in a visualization of the distribution of males and females in each cluster.
  • 10. 3. Regression Analysis Regression can be done effectively with more options via WEKA software.Lets explain it using a simple “LinearRegression” 3.1 Pricing the house Data is taken from an online source .The selling price of the house needs to be determined based on the data given. The data contains the following attributes. 1) houseSize NUMERIC 2) lotSize NUMERIC 3) bedrooms NUMERIC 4) granite NUMERIC 5) bathroom NUMERIC 6) sellingPrice NUMERIC So, based on the size of the house, Lot size ,number of bedrooms it has ,whether it is furnished with Granite, number of bathroom ,we need to predict the DEPENDANT VARIABLE ,i.e. the SELLING PRICE. First, the data is loaded into WEKA and necessary preprocess is done. Since, our data is already processed .We proceed to selecting the type of REGRESSION
  • 11. In the picture given above select the “Linear Regression” tab. Then Select “Use Training Set” in the Test Options. There are three other choices available while doing simple Linear Regression they are  Supplied test set: Supply test data to do model
  • 12. Cross-validation : which lets WEKA build a model based on subsets of the supplied data and then average them out to create a final model  Percentage split: where WEKA takes a percentile subset to build a final model. Here the column “Selling Price” is chosen. This means with the available data we are going to predict the DEPENDANT VARIABLE (Selling Price). Then click on the “Start” button to build a model using WEKA. OUTPUT: ================================================================================================ === Run information === Scheme: weka.classifiers.functions.LinearRegression -S 0 -R 1.0E-8 Relation: house Instances: 700 Attributes: 6 houseSize lotSize bedrooms granite bathroom sellingPrice Test mode: evaluate on training data === Classifier model (full training set) === Linear Regression Model sellingPrice = 22.6582 * houseSize + 9.1242 * lotSize + 42145.0767 * bedrooms + 42562.0901 * bathroom + -20981.3142 Time taken to build model: 0.04 seconds === Evaluation on training set === === Summary === Correlation coefficient 0.9945 Mean absolute error 4790.821 Root mean squared error 4245.4125 Relative absolute error 11.9082 % Root relative squared error 11.21 % Total Number of Instances 700 ================================================================================================ The output predicts that the Selling price will be
  • 13. sellingPrice= (22.6582*houseSize) + (9.1242 * lotSize) + (42145.0767 * bedrooms) + (42562.0901 * bathroom) -20981.3142. If we want to determine the “selling price” of the house based on given data just “Plug in” the values and find it easily. The output predicts that the “Granite” doesn’t matter much regarding the SELLING PRICE of the house. 4. References http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm www.cs.waikato.ac.nz/ml/weka/ http://www.laits.utexas.edu/~norman/BUS.FOR/course.mat/Alex/ http://maya.cs.depaul.edu/classes/ect584/weka/k-means.html http://www.cs.utexas.edu/users/ml/tutorials/Weka-tut/