SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
IT FOR BUSINESS INTELLIGENCE




Data Analysis techniques using
WEKA: Classification and
Regression
                  Nikhil Yagnic (07AG3801)
Introduction
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software
written in Java, developed at the University of Waikato, New Zealand. Weka is free software
available under the GNU General Public License.

The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis
and predictive modelling, together with graphical user interfaces for easy access to this functionality.
The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling
algorithms implemented in other programming languages, plus data pre-processing utilities in C, and
a Makefile-based system for running machine learning experiments. This original version was
primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent
fully Java-based version (Weka 3), for which development started in 1997, is now used in many
different application areas, in particular for educational purposes and research. Advantages of Weka
include:

       free availability under the GNU General Public License
       portability, since it is fully implemented in the Java programming language and thus runs on
        almost any modern computing platform
       a comprehensive collection of data pre-processing and modelling techniques
       ease of use due to its graphical user interfaces

Weka supports several standard data mining tasks, more specifically, data pre-processing, clustering,
classification, regression, visualization, and feature selection. All of Weka's techniques are
predicated on the assumption that the data is available as a single flat file or relation, where each
data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but
some other attribute types are also supported). Weka provides access to SQL databases using Java
Database Connectivity and can process the result returned by a database query. It is not capable of
multi-relational data mining, but there is separate software for converting a collection of linked
database tables into a single table that is suitable for processing using Weka.[4] Another important
area that is currently not covered by the algorithms included in the Weka distribution is sequence
modelling.


Classification via decision trees using WEKA

Problem:
A bank is introducing a new financial product. So the bank wants to classify the new customers
whether they will be ready to buy the new product or not. Bank has the existing information from
the old clients who are interested in buying the new product.

Classification is a statistical technique that helps to classify any new client into one of the existing
groups. It will create a model on the test data available. And then classifies the new data based on
the model that is developed using the test data.

Steps to do classification in WEKA
Step 1: Create a data file in the format of arff or csv. Weka understands these two formats. We are
taking the file in csv format Bank.csv
Step 2: Open the Weka application. This will show the following screen




Step 3: Loading data into WEKA.

To do that click on the open file button and browse for the bank.csv file. Then it shows all the
attributes as shown in the below figure.
Step 4: View the data

In the selected attribute panel you can see the values corresponding to the attributes and also its
type, name e.t.c

You can also visualize the frequency distribution of all the attributes at a time by clicking on the
“Visualize All” button. It shows the following screen.
This visualizes all shows the range of data for each attribute and also the mean, median and
frequency of each attribute. For example the value of age in our case is ranging from 18 to 67 with
an average of 42.5

Step 5: Classify the Test data

To do this select the classify button which shows the following screen.




Then select the J48 algorithm which is under the node of tree when you click on the choose button.
This will show the following screen.
Step 6: Run the classification Algorithm

Select the dependent variable that should be classified and click on the start.

This shows the output in the classifier output panel in ASCII version of the tree.

This is difficult to understand. To view the output in the form of tree, right click on the trees.j48 and
select “visualize tree” option. This shows the following screen by again right clicking on the output
and selecting full screen option.
Step 7: Analyze the model created by existing data

From the Classifier output we can find that the Classification accuracy of the model is 89%.

This means that the model is able to predict the values 89% correctly. So if we use the same model
to find out the buying decision of new customer the probability will be 0.89

Step 8: Test the New customer data

Create your new customer data in arff or csv format with the same attributes as test data.

Now input the data by checking the radio button “Supplied test set” and click on “ set” to browse for
the new data set.
Then click on the start button which generates a new tree.

Save the classification result as arff. This file contains a copy of the new instances along with an
additional column for the predicted value. The result will look like following.
Regression Using WEKA
Problem:
The idea is to find out how the CPU performance is correlated with the attributes like machine cycle
time, minimum main memory, cache memory e.t.c

A regression is a statistic tool that helps in finding out how the dependent variable (CPU
performance) is related to the independent attributes.

Steps to do Regression in WEKA
Step 1: Create data file and open the WEKA as in the same way as we did for Classification.

Step 2: Load the regression data file CPU.arff into weka.

Click on open file and browse for the file, that shows the following screen




Step 3: Run the regression

Click on the Classify tab and choose “Linear Regression” from the node under function. This shows
the following screen.
Click on start that will show output in the classifier output screen which gives a regression equation.
Interpretation of the output:

The CPU performance is more dependent on CHMAX and then CACHE

The correlation coefficient of 0.912 is very high, its output suggests that the dependent
variable is strongly associated with the independent variables.

We can also determine the new CPU performance by using the regression equation if we
have the values of the attributes.

Contenu connexe

Tendances

Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationGeant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationRasheed Auguste
 
Ancient Database Presentation
Ancient Database PresentationAncient Database Presentation
Ancient Database Presentationredhelix
 
SSIS Project Profile
SSIS Project ProfileSSIS Project Profile
SSIS Project Profiletthompson0421
 
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...bidwhm
 
Association Rule Mining Using WEKA
Association Rule Mining Using WEKAAssociation Rule Mining Using WEKA
Association Rule Mining Using WEKAProthoma Diteeya
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPTbutest
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573Santhoo Vardan
 
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Ashock Roy
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dcAmit Sharma
 
Introduction to ado
Introduction to adoIntroduction to ado
Introduction to adoHarman Bajwa
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouseKomal Choudhary
 
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGSAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGbidwhm
 
Process management seminar
Process management seminarProcess management seminar
Process management seminarapurva_naik
 

Tendances (20)

Oracle reports
Oracle reportsOracle reports
Oracle reports
 
Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_SimulationGeant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
Geant4_Web_Application_Update_and_Pion_Cross_Section_Simulation
 
Create generic delta
Create generic deltaCreate generic delta
Create generic delta
 
Ancient Database Presentation
Ancient Database PresentationAncient Database Presentation
Ancient Database Presentation
 
SSIS Project Profile
SSIS Project ProfileSSIS Project Profile
SSIS Project Profile
 
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...Dataflux Training  syllabus Dataflux management studio training syllabus ,Dat...
Dataflux Training syllabus Dataflux management studio training syllabus ,Dat...
 
6 database
6 database 6 database
6 database
 
Association Rule Mining Using WEKA
Association Rule Mining Using WEKAAssociation Rule Mining Using WEKA
Association Rule Mining Using WEKA
 
SumatraTT – PPT
SumatraTT – PPTSumatraTT – PPT
SumatraTT – PPT
 
Ado.net
Ado.netAdo.net
Ado.net
 
pega training whatsup@8142976573
pega training whatsup@8142976573pega training whatsup@8142976573
pega training whatsup@8142976573
 
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)Pega Training institutes in Banglore ( ashockroy99@gmail.com)
Pega Training institutes in Banglore ( ashockroy99@gmail.com)
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dc
 
Introduction to ado
Introduction to adoIntroduction to ado
Introduction to ado
 
Olap
OlapOlap
Olap
 
Etl process in data warehouse
Etl process in data warehouseEtl process in data warehouse
Etl process in data warehouse
 
Database testing
Database testingDatabase testing
Database testing
 
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAININGSAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
SAS DATAFLUX DATA MANAGEMENT STUDIO TRAINING
 
Process management seminar
Process management seminarProcess management seminar
Process management seminar
 

Similaire à Itb weka nikhil

Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Saurabh Singh
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekarathorenitin87
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKAFayan TAO
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introductionbutest
 
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
James Jara Portfolio 2014  - Enterprise datagrid - Part 3James Jara Portfolio 2014  - Enterprise datagrid - Part 3
James Jara Portfolio 2014 - Enterprise datagrid - Part 3James Jara
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Toolsmcthedog
 
What Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptxWhat Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptxProductdata Scrape
 
What Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdfWhat Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdfProductdata Scrape
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
 
End to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionEnd to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionCleo Filho
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic QuestionsSooraj Vinodan
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Amu Singh
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparisonStratebi
 
lab #6
lab #6lab #6
lab #6butest
 
Automation Framework Design
Automation Framework DesignAutomation Framework Design
Automation Framework DesignKunal Saxena
 
Simulink Lab Manual final.doc
Simulink Lab Manual final.docSimulink Lab Manual final.doc
Simulink Lab Manual final.docAkashPatel490216
 

Similaire à Itb weka nikhil (20)

Itb weka
Itb wekaItb weka
Itb weka
 
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
Data Mining Techniques using WEKA_Saurabh Singh_10BM60082
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
Data Mining using Weka
Data Mining using WekaData Mining using Weka
Data Mining using Weka
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
Weka toolkit introduction
Weka toolkit introductionWeka toolkit introduction
Weka toolkit introduction
 
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
James Jara Portfolio 2014  - Enterprise datagrid - Part 3James Jara Portfolio 2014  - Enterprise datagrid - Part 3
James Jara Portfolio 2014 - Enterprise datagrid - Part 3
 
Test Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful ToolsTest Strategy Utilising Mc Useful Tools
Test Strategy Utilising Mc Useful Tools
 
What Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptxWhat Are the Key Steps in Scraping Product Data from Amazon India.pptx
What Are the Key Steps in Scraping Product Data from Amazon India.pptx
 
What Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdfWhat Are the Key Steps in Scraping Product Data from Amazon India.pdf
What Are the Key Steps in Scraping Product Data from Amazon India.pdf
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
End to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolutionEnd to-end root cause analysis minimize the time to incident resolution
End to-end root cause analysis minimize the time to incident resolution
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic Questions
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
 
Business Intelligence tools comparison
Business Intelligence tools comparisonBusiness Intelligence tools comparison
Business Intelligence tools comparison
 
lab #6
lab #6lab #6
lab #6
 
Automation Framework Design
Automation Framework DesignAutomation Framework Design
Automation Framework Design
 
Simulink Lab Manual final.doc
Simulink Lab Manual final.docSimulink Lab Manual final.doc
Simulink Lab Manual final.doc
 

Dernier

Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insightsseri bangash
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 

Dernier (20)

Understanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key InsightsUnderstanding the Pakistan Budgeting Process: Basics and Key Insights
Understanding the Pakistan Budgeting Process: Basics and Key Insights
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 

Itb weka nikhil

  • 1. IT FOR BUSINESS INTELLIGENCE Data Analysis techniques using WEKA: Classification and Regression Nikhil Yagnic (07AG3801)
  • 2. Introduction Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Weka is free software available under the GNU General Public License. The Weka workbench[1] contains a collection of visualization tools and algorithms for data analysis and predictive modelling, together with graphical user interfaces for easy access to this functionality. The original non-Java version of Weka was a TCL/TK front-end to (mostly third-party) modelling algorithms implemented in other programming languages, plus data pre-processing utilities in C, and a Makefile-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains,[2][3] but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include:  free availability under the GNU General Public License  portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform  a comprehensive collection of data pre-processing and modelling techniques  ease of use due to its graphical user interfaces Weka supports several standard data mining tasks, more specifically, data pre-processing, clustering, classification, regression, visualization, and feature selection. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka.[4] Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modelling. Classification via decision trees using WEKA Problem: A bank is introducing a new financial product. So the bank wants to classify the new customers whether they will be ready to buy the new product or not. Bank has the existing information from the old clients who are interested in buying the new product. Classification is a statistical technique that helps to classify any new client into one of the existing groups. It will create a model on the test data available. And then classifies the new data based on the model that is developed using the test data. Steps to do classification in WEKA Step 1: Create a data file in the format of arff or csv. Weka understands these two formats. We are taking the file in csv format Bank.csv
  • 3. Step 2: Open the Weka application. This will show the following screen Step 3: Loading data into WEKA. To do that click on the open file button and browse for the bank.csv file. Then it shows all the attributes as shown in the below figure.
  • 4. Step 4: View the data In the selected attribute panel you can see the values corresponding to the attributes and also its type, name e.t.c You can also visualize the frequency distribution of all the attributes at a time by clicking on the “Visualize All” button. It shows the following screen.
  • 5. This visualizes all shows the range of data for each attribute and also the mean, median and frequency of each attribute. For example the value of age in our case is ranging from 18 to 67 with an average of 42.5 Step 5: Classify the Test data To do this select the classify button which shows the following screen. Then select the J48 algorithm which is under the node of tree when you click on the choose button. This will show the following screen.
  • 6. Step 6: Run the classification Algorithm Select the dependent variable that should be classified and click on the start. This shows the output in the classifier output panel in ASCII version of the tree. This is difficult to understand. To view the output in the form of tree, right click on the trees.j48 and select “visualize tree” option. This shows the following screen by again right clicking on the output and selecting full screen option.
  • 7. Step 7: Analyze the model created by existing data From the Classifier output we can find that the Classification accuracy of the model is 89%. This means that the model is able to predict the values 89% correctly. So if we use the same model to find out the buying decision of new customer the probability will be 0.89 Step 8: Test the New customer data Create your new customer data in arff or csv format with the same attributes as test data. Now input the data by checking the radio button “Supplied test set” and click on “ set” to browse for the new data set.
  • 8. Then click on the start button which generates a new tree. Save the classification result as arff. This file contains a copy of the new instances along with an additional column for the predicted value. The result will look like following.
  • 9. Regression Using WEKA Problem: The idea is to find out how the CPU performance is correlated with the attributes like machine cycle time, minimum main memory, cache memory e.t.c A regression is a statistic tool that helps in finding out how the dependent variable (CPU performance) is related to the independent attributes. Steps to do Regression in WEKA Step 1: Create data file and open the WEKA as in the same way as we did for Classification. Step 2: Load the regression data file CPU.arff into weka. Click on open file and browse for the file, that shows the following screen Step 3: Run the regression Click on the Classify tab and choose “Linear Regression” from the node under function. This shows the following screen.
  • 10. Click on start that will show output in the classifier output screen which gives a regression equation.
  • 11. Interpretation of the output: The CPU performance is more dependent on CHMAX and then CACHE The correlation coefficient of 0.912 is very high, its output suggests that the dependent variable is strongly associated with the independent variables. We can also determine the new CPU performance by using the regression equation if we have the values of the attributes.