Lecture 4: The Weka Package

Lecture 4:
The Weka Package
Marina Santini, Uppsala University
Department of Linguistics and Philology, September 2013
Lec 4:TheWeka Package1
Machine Learning for Language Technology

Outline
Re:Witten & Frank (2005)
 Introduction to Weka (Ch. 9)
 Getting Started: The Explorer (Ch. 10)
 The basic methods (4.3, 4.6, 4.7)
 Implementations (6.1, 6.3, 6.4)
 Evaluation (5.1-5.6)
 Assignment 1

Introduction: What is Weka?
 WEKA: Waikato Environment for Knowledge Analysis
 Weka: the name of a flightless bird living in New Zealand
 The Weka workbench is a collection of state-of-the-art machine
learning algorithms and data preprocessing tools;
 Open source code (GNU General Public License ) written in
Java
 http://www.cs.waikato.ac.nz/ml/weka/downloading.html

The interface: The Explorer
 Uploading the input (ARFF format);
 Preprocessing
 Bulding a classifier;
 Tuning the parameters;
 Examining the output (evaluation)

Uploading the input
(2nd_set_7webgenres.arff)

Preprocessing

Building a classifier

Methods & Implementations
 Decision Trees
 J4.8 is Weka’s implementation of C.4.5 revision 8.
 Instance-Based Learning
 IBk is a k-nearest-neighbor classifier that uses the Eucledian distance as
a default, other options include Manhattan, Chebyshev and Minkowski
distances.The number of nearest neighbors (default k=1) can be
specified explicitly in the parameter window.
 Linear Models
 In VotedPerceptron, each weight vector contribute a certain number
of votes.
 SMO implements the sequential minimal optimization algorithm for
training a support vector classifier, (SVM) using polynomial or
Gaussian kernels (Platt 1998, Keerthi et al. 2001).
 Logistic builds linear logistic regression models

Tuning Parameters

Evaluation

Compare Results

Assignment 1
 Classification: Decision Trees, Nearest Neighbors and a linear
classifier of your choice;
 Software package: Weka;
 Data sets:
 German plural
 English past tense
 Send WRITTEN REPORT to: santinim@stp.lingfil.uu.se
 Report deadline Fri 4 Oct 2013, week 40.

Thank you and Good Luck!

Lecture 4: The Weka Package

Recommended

Recommended

More Related Content

Similar to Lecture 4: The Weka Package

Similar to Lecture 4: The Weka Package (20)

More from Marina Santini

More from Marina Santini (20)

Recently uploaded

Recently uploaded (20)

Lecture 4: The Weka Package