The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It includes virtually all the algorithms described in this book. It is designed so that you can quickly try out existing
methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. As well as a wide variety of learning algorithms, it includes a wide range of preprocessing tools. This diverse and comprehensive
toolkit is accessed through a common interface so that its users can compare different methods and identify those that are most appropriate for the problem at hand. (Witten and Frank, 2005)
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
Lecture 4: The Weka Package
1. Lecture 4:
The Weka Package
Marina Santini, Uppsala University
Department of Linguistics and Philology, September 2013
Lec 4:TheWeka Package1
Machine Learning for Language Technology
2. Outline
Lec 4:TheWeka Package2
Re:Witten & Frank (2005)
Introduction to Weka (Ch. 9)
Getting Started: The Explorer (Ch. 10)
The basic methods (4.3, 4.6, 4.7)
Implementations (6.1, 6.3, 6.4)
Evaluation (5.1-5.6)
Assignment 1
3. Introduction: What is Weka?
Lec 4:TheWeka Package3
WEKA: Waikato Environment for Knowledge Analysis
Weka: the name of a flightless bird living in New Zealand
The Weka workbench is a collection of state-of-the-art machine
learning algorithms and data preprocessing tools;
Open source code (GNU General Public License ) written in
Java
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
4. The interface: The Explorer
Lec 4:TheWeka Package4
Uploading the input (ARFF format);
Preprocessing
Bulding a classifier;
Tuning the parameters;
Examining the output (evaluation)
8. Methods & Implementations
Lec 4:TheWeka Package8
Decision Trees
J4.8 is Weka’s implementation of C.4.5 revision 8.
Instance-Based Learning
IBk is a k-nearest-neighbor classifier that uses the Eucledian distance as
a default, other options include Manhattan, Chebyshev and Minkowski
distances.The number of nearest neighbors (default k=1) can be
specified explicitly in the parameter window.
Linear Models
In VotedPerceptron, each weight vector contribute a certain number
of votes.
SMO implements the sequential minimal optimization algorithm for
training a support vector classifier, (SVM) using polynomial or
Gaussian kernels (Platt 1998, Keerthi et al. 2001).
Logistic builds linear logistic regression models
12. Assignment 1
Lec 4:TheWeka Package12
Classification: Decision Trees, Nearest Neighbors and a linear
classifier of your choice;
Software package: Weka;
Data sets:
German plural
English past tense
Send WRITTEN REPORT to: santinim@stp.lingfil.uu.se
Report deadline Fri 4 Oct 2013, week 40.