SlideShare a Scribd company logo
1 of 13
Lecture 4:
The Weka Package
Marina Santini, Uppsala University
Department of Linguistics and Philology, September 2013
Lec 4:TheWeka Package1
Machine Learning for Language Technology
Outline
Lec 4:TheWeka Package2
Re:Witten & Frank (2005)
 Introduction to Weka (Ch. 9)
 Getting Started: The Explorer (Ch. 10)
 The basic methods (4.3, 4.6, 4.7)
 Implementations (6.1, 6.3, 6.4)
 Evaluation (5.1-5.6)
 Assignment 1
Introduction: What is Weka?
Lec 4:TheWeka Package3
 WEKA: Waikato Environment for Knowledge Analysis
 Weka: the name of a flightless bird living in New Zealand
 The Weka workbench is a collection of state-of-the-art machine
learning algorithms and data preprocessing tools;
 Open source code (GNU General Public License ) written in
Java
 http://www.cs.waikato.ac.nz/ml/weka/downloading.html
The interface: The Explorer
Lec 4:TheWeka Package4
 Uploading the input (ARFF format);
 Preprocessing
 Bulding a classifier;
 Tuning the parameters;
 Examining the output (evaluation)
Uploading the input
(2nd_set_7webgenres.arff)
Lec 4:TheWeka Package5
Preprocessing
Lec 4:TheWeka Package6
Building a classifier
Lec 4:TheWeka Package7
Methods & Implementations
Lec 4:TheWeka Package8
 Decision Trees
 J4.8 is Weka’s implementation of C.4.5 revision 8.
 Instance-Based Learning
 IBk is a k-nearest-neighbor classifier that uses the Eucledian distance as
a default, other options include Manhattan, Chebyshev and Minkowski
distances.The number of nearest neighbors (default k=1) can be
specified explicitly in the parameter window.
 Linear Models
 In VotedPerceptron, each weight vector contribute a certain number
of votes.
 SMO implements the sequential minimal optimization algorithm for
training a support vector classifier, (SVM) using polynomial or
Gaussian kernels (Platt 1998, Keerthi et al. 2001).
 Logistic builds linear logistic regression models
Tuning Parameters
Lec 4:TheWeka Package9
Evaluation
Lec 4:TheWeka Package10
Compare Results
Lec 4:TheWeka Package11
Assignment 1
Lec 4:TheWeka Package12
 Classification: Decision Trees, Nearest Neighbors and a linear
classifier of your choice;
 Software package: Weka;
 Data sets:
 German plural
 English past tense
 Send WRITTEN REPORT to: santinim@stp.lingfil.uu.se
 Report deadline Fri 4 Oct 2013, week 40.
Thank you and Good Luck!
Lec 4:TheWeka Package13

More Related Content

Similar to Lecture 4: The Weka Package

TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
Fayan TAO
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Po-Chuan Chen
 
Lec-1A Introduction and Review OOPS SLIDES.pdf
Lec-1A Introduction and Review OOPS SLIDES.pdfLec-1A Introduction and Review OOPS SLIDES.pdf
Lec-1A Introduction and Review OOPS SLIDES.pdf
DarshMenon1
 
CS 898O : Machine Learning
CS 898O : Machine LearningCS 898O : Machine Learning
CS 898O : Machine Learning
butest
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
John Doove
 

Similar to Lecture 4: The Weka Package (20)

TAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKATAO Fayan_ Introduction to WEKA
TAO Fayan_ Introduction to WEKA
 
2019 swan-cs3
2019 swan-cs32019 swan-cs3
2019 swan-cs3
 
The Planets Testbed
The Planets TestbedThe Planets Testbed
The Planets Testbed
 
TEACHING TCP/IP NETWORKING USING HANDS-ON LABORATORY EXPERIENCE
TEACHING TCP/IP NETWORKING USING HANDS-ON  LABORATORY EXPERIENCETEACHING TCP/IP NETWORKING USING HANDS-ON  LABORATORY EXPERIENCE
TEACHING TCP/IP NETWORKING USING HANDS-ON LABORATORY EXPERIENCE
 
Advanced Data Mining with Weka - Edukite
Advanced Data Mining with Weka - EdukiteAdvanced Data Mining with Weka - Edukite
Advanced Data Mining with Weka - Edukite
 
Programas y Pruebas en Dafny
Programas y Pruebas en DafnyProgramas y Pruebas en Dafny
Programas y Pruebas en Dafny
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
VP9 my work
VP9 my workVP9 my work
VP9 my work
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
A package system for maintaining large model distributions in vle software
A package system for maintaining large model distributions in vle softwareA package system for maintaining large model distributions in vle software
A package system for maintaining large model distributions in vle software
 
Lec-1A Introduction and Review OOPS SLIDES.pdf
Lec-1A Introduction and Review OOPS SLIDES.pdfLec-1A Introduction and Review OOPS SLIDES.pdf
Lec-1A Introduction and Review OOPS SLIDES.pdf
 
PPT
PPTPPT
PPT
 
Development of an statistical package for genetic evaluation of trees
Development of an statistical package for genetic evaluation of treesDevelopment of an statistical package for genetic evaluation of trees
Development of an statistical package for genetic evaluation of trees
 
ms_spiral_in_tpack_itl8_august_2011
ms_spiral_in_tpack_itl8_august_2011ms_spiral_in_tpack_itl8_august_2011
ms_spiral_in_tpack_itl8_august_2011
 
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
IEEE P2P 2013 - Bootstrapping Skynet: Calibration and Autonomic Self-Control ...
 
CS 898O : Machine Learning
CS 898O : Machine LearningCS 898O : Machine Learning
CS 898O : Machine Learning
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems Design
 

More from Marina Santini

More from Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

Lecture 4: The Weka Package

  • 1. Lecture 4: The Weka Package Marina Santini, Uppsala University Department of Linguistics and Philology, September 2013 Lec 4:TheWeka Package1 Machine Learning for Language Technology
  • 2. Outline Lec 4:TheWeka Package2 Re:Witten & Frank (2005)  Introduction to Weka (Ch. 9)  Getting Started: The Explorer (Ch. 10)  The basic methods (4.3, 4.6, 4.7)  Implementations (6.1, 6.3, 6.4)  Evaluation (5.1-5.6)  Assignment 1
  • 3. Introduction: What is Weka? Lec 4:TheWeka Package3  WEKA: Waikato Environment for Knowledge Analysis  Weka: the name of a flightless bird living in New Zealand  The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools;  Open source code (GNU General Public License ) written in Java  http://www.cs.waikato.ac.nz/ml/weka/downloading.html
  • 4. The interface: The Explorer Lec 4:TheWeka Package4  Uploading the input (ARFF format);  Preprocessing  Bulding a classifier;  Tuning the parameters;  Examining the output (evaluation)
  • 7. Building a classifier Lec 4:TheWeka Package7
  • 8. Methods & Implementations Lec 4:TheWeka Package8  Decision Trees  J4.8 is Weka’s implementation of C.4.5 revision 8.  Instance-Based Learning  IBk is a k-nearest-neighbor classifier that uses the Eucledian distance as a default, other options include Manhattan, Chebyshev and Minkowski distances.The number of nearest neighbors (default k=1) can be specified explicitly in the parameter window.  Linear Models  In VotedPerceptron, each weight vector contribute a certain number of votes.  SMO implements the sequential minimal optimization algorithm for training a support vector classifier, (SVM) using polynomial or Gaussian kernels (Platt 1998, Keerthi et al. 2001).  Logistic builds linear logistic regression models
  • 12. Assignment 1 Lec 4:TheWeka Package12  Classification: Decision Trees, Nearest Neighbors and a linear classifier of your choice;  Software package: Weka;  Data sets:  German plural  English past tense  Send WRITTEN REPORT to: santinim@stp.lingfil.uu.se  Report deadline Fri 4 Oct 2013, week 40.
  • 13. Thank you and Good Luck! Lec 4:TheWeka Package13