SlideShare a Scribd company logo
1 of 5
CS 348: Introduction to Artificial Intelligence
                                 Lab 3: Decision Trees
This lab will introduce you to machine learning using decision trees. Decision tree induction has been
described in class and is in section 18.3 of the textbook. Decision tree induction is a machine learning
approach to approximating f, given a set of examples. An example is a tuple <x1, x2,…, xn, f(x1, x2,…, xn)>
consisting of values for the n inputs to the function f and the output of f, given those values.
For this lab, you will construct a binary decision tree learner, examine its performance on a variety of
binary classification problems, and report the results. The following sections describe the file format for
examples, the kind of executable to create, the questions to answer, and what needs to be handed in.

INPUT FILE FORMAT
The input file format is simple. Input files are text files. The first line of each file contains a list of attribute
names. Each attribute name is separated from the following attribute by one or more blank characters
(spaces and tabs). Each additional line is an example. Each example line contains n+1 binary values, where
n is the number of attributes in the first line. Binary values are encoded as the lower case words “true” and
“false.” The ith value in each example line is the value of the ith attribute in that example. The final value
in each example line is the categorization of that example. The task for a machine learner is to learn how to
categorize examples, using only the values specified for the attributes, so that the machine’s categorization
matches the categorization specified in the file.
The following is an example of the input file format for a function of three binary attributes.

ivy_school good_gpa good_letters
true true true true
true true true false
true false true false
false true true false
true false true false
true true true true
false true true true
true false false true
false false false false
true true false false
false false false true
false false false true

THE EXECUTABLE
Your program must be written in C, C++, Java, or Lisp. The executable requirements for the varying
languages are outlined below.
If your program is written in C, C++, or Java:
         Your executable must run in Windows XP and must be callable from the command line. It must be
         named dtree.exe (in the case of a native windows executable) or dtree.jar (in the case of a Java
         byte code executable). The executable must accept the three parameters shown on the below, in
         the order shown below.
         dtree.exe <file name> <training set size> <number of trials>
         The previous line is for a Windows XP executable, compiled from C or C++. Your windows
         executable must conform to this specification.
         In this specification, <file name> is the name of the text file containing the examples, <training set
         size> is an integer specifying the number of examples to include in the training set, and <number
         of trials> is the number of times a training set will be selected to create a decision tree.
         If you have chosen to create your program in Java, we require that you create an executable .jar
         file so that we may call the file using the following syntax.
CS 348: Introduction to Artificial Intelligence
                                 Lab 3: Decision Trees
         Java –jar dtree.jar <file name> <training set size> <number of
         trials>
         If you do not know how to create a .jar file, there is an excellent tutorial available at the following
         URL.
         http://java.sun.com/docs/books/tutorial/jar/
         For Java code, your class must be run-able on a Windows machine with Java 1.4.X or later and it
         should require no change to the CLASSPATH. You can test this by trying your code on other
         machines with Java and making sure you aren't forgetting any dependencies. If you have questions
         on this, please email Sara, the TA.
If your program is written in Lisp:
         Your code should be written in a file called dtree.lisp. Within this file, you should include a
         function called “dtree” which takes three parameters as mentioned above (file name, training
         set size, and number of trials). Your code will be tested using “Allegro CL,” so you should make
         sure that it runs in that environment.


When run, your executable must perform the following steps.
    1) Read in the text file containing the examples.
    2) Divide the set of examples into a training set and a testing set by randomly selecting the number of
       examples for the training set specified in the command-line input <training set size>. Use the
       remainder for the testing set.
    3) Estimate the expected probability of TRUE and FALSE classifications, based on the examples in
         the training set.
    4) Construct a decision tree, based on the training set, using the approach described in section 18.3 of
         the text.
    5) Classify the examples in the testing set using the decision tree built in step 4.
    6) Classify the examples in the testing set using the prior probabilities from step 3.
    7) Determine the proportion of correct classifications made in steps 5 and 6 by comparing the
         classifications to the correct answers.
    8) Steps 2 through 7 constitute a trial. Repeat steps 2 through 6 until the number of trials is equal to
         the value specified in the command-line input <number of trials>.
    9) Print the results for each trial to an output file called output.txt. The format of output.txt is
         specified in the following section.

OUTPUT FILE FORMAT
Each run of your decision tree program should create an output text file that contains the following
information:
    •    The input file name
    •    The training set size
    •    The number of trials
In additions, you must provide the following information for each trial:
    •    The number of the trial
    •    The set of examples in the training set
CS 348: Introduction to Artificial Intelligence
                                Lab 3: Decision Trees
    •    The set of examples in the testing set
    •    The classification returned by the decision tree for each member of the testing set
    •    The classification returned by applying prior probability to each member of the testing set.
    •    The proportion of correct classifications returned by the decision tree
    •    The proportion of correct classifications returned by applying prior probability.
If there are multiple trials, then this information should be in the output file for EACH AND EVERY trial.
We will not require that a particular layout be used. That said, if we have ANY problem finding the
information specified above in your output file, you WILL lose points. It is up to you to make your output
format CLEAR.
What follows is an example output for a single trial in a format that would be completely acceptable. (Yes,
I am aware that trial 1 and trial 2 are identical. This is merely an example of an acceptable output file
format. In your actual output, two trials would NOT be identical, because the selection of the training set
would be RANDOM, resulting in trials that are different.)




Input file name: Examples1.txt
Training set size: 7
Number of trials: 2
TRIAL 1:
Training set:
ivy_school good_gpa good_letters correctClass
true true true                      true
false true true                     true
true false false                    true
false false false                   false
true true false                     false
false false false                   true
false false false                   true
Test set:
ivy_school good_gpa good_letters correctClass dTreeClass priorProbClass
true true true                      true        true        true
true true true                      false       false       true
true false true                     false       true        true
false true true                     false       false       false
true false true                     false       false       true

Decision Tree proportion correct: .8
Prior Probability proportion correct: .2

TRIAL 2:
Training set:
ivy_school good_gpa good_letters correctClass
true true true                      true
false true true                     true
true false false                    true
false false false                   false
true true false                     false
false false false                   true
CS 348: Introduction to Artificial Intelligence
                         Lab 3: Decision Trees
false false false                   true
Test set:
ivy_school good_gpa good_letters correctClass dTreeClass priorProbClass
true true true                      true        true        true
true true true                      false       false       true
true false true                     false       true        true
false true true                     false       false       false
true false true                     false       false       true

Decision Tree proportion correct: .8
Prior Probability proportion correct: .2
CS 348: Introduction to Artificial Intelligence
                                Lab 3: Decision Trees

QUESTIONS TO ANSWER
Create a MS Word or Adobe Acrobat document that contains the answers to the
following questions. NOTE: The answer to each of these questions should take between
½ and one full page. Do not take less than ½ a page. Do not take more than one page. We
won’t grade more than one page per question. You may use tables or figures if this helps.
When answering these questions, you MUST support your answers with experimental
data gathered from running your program. When creating your data, don’t forget about
the importance of multiple trials.
    1) Compare the performance of the decision tree algorithm to that of simply using prior probability
         on the MajorityRule.txt example set.
    2) How does the size of the training set affect the performance of the decision tree algorithm on the
       IvyLeague.txt example set?
    3) Create a binary decision task that uses at least four attributes. Describe the task. Construct an
         example set for this task that contains at least 32 examples. Evaluate the performance of the
         decision tree algorithm on this task.

WHAT TO HAND IN
You are to submit the following things:
    1) The source code for your program: It should be sufficiently modular and well commented. If it is
         not well commented, don’t be surprised if you lose points.
    2) An executable for the program that conforms to the specifications in “The Executable” section.
    3) A document containing answers to the lab questions, as specified in the “Questions to Answer”
         section.
    4) The input file you created for question 3.

HOW TO HAND IT IN
To submit your lab, compress all of the files specified into a .zip file. The file should be named in the
following manner, lastname_labnumber.zip. For example, Owsley_3.zip. Submit this .zip file via email
to cs348@cs.northwestern.edu.
*NOTE* given the issue of .exe files being stripped from email attachments, please rename your “.exe”
executable to “dtree.txt” in the .zip file sent to us.

Due by the start of class on Wednesday, May 31, 2006.

More Related Content

Similar to Decision Tree Lab Performance Analysis

Observations
ObservationsObservations
Observationsbutest
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxfredharris32
 
Testers Desk Presentation
Testers Desk PresentationTesters Desk Presentation
Testers Desk PresentationQuality Testing
 
Hey i have attached the required file for my assignment.and addi
Hey i have attached the required file for my assignment.and addiHey i have attached the required file for my assignment.and addi
Hey i have attached the required file for my assignment.and addisorayan5ywschuit
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docxFaculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docxmydrynan
 
Coding Component (50)Weve provided you with an implementation .docx
Coding Component (50)Weve provided you with an implementation .docxCoding Component (50)Weve provided you with an implementation .docx
Coding Component (50)Weve provided you with an implementation .docxmary772
 
asgmt01.classpathasgmt01.project asgmt01 .docx
asgmt01.classpathasgmt01.project  asgmt01  .docxasgmt01.classpathasgmt01.project  asgmt01  .docx
asgmt01.classpathasgmt01.project asgmt01 .docxfredharris32
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1ecomputernotes
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxamrit47
 
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and Func
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and FuncCSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and Func
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and FuncMargenePurnell14
 
Goals1)Be able to work with individual bits in java.2).docx
Goals1)Be able to work with individual bits in java.2).docxGoals1)Be able to work with individual bits in java.2).docx
Goals1)Be able to work with individual bits in java.2).docxjosephineboon366
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structuresecomputernotes
 
Lewis jssap3 e_labman02
Lewis jssap3 e_labman02Lewis jssap3 e_labman02
Lewis jssap3 e_labman02auswhit
 
Write codeforhumans
Write codeforhumansWrite codeforhumans
Write codeforhumansNarendran R
 
Quality Assurance 2: Searching for Bugs
Quality Assurance 2: Searching for BugsQuality Assurance 2: Searching for Bugs
Quality Assurance 2: Searching for BugsMarc Miquel
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple stepsRenjith M P
 

Similar to Decision Tree Lab Performance Analysis (20)

TDD Training
TDD TrainingTDD Training
TDD Training
 
Observations
ObservationsObservations
Observations
 
Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Testers Desk Presentation
Testers Desk PresentationTesters Desk Presentation
Testers Desk Presentation
 
Hey i have attached the required file for my assignment.and addi
Hey i have attached the required file for my assignment.and addiHey i have attached the required file for my assignment.and addi
Hey i have attached the required file for my assignment.and addi
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docxFaculty of ScienceDepartment of ComputingFinal Examinati.docx
Faculty of ScienceDepartment of ComputingFinal Examinati.docx
 
Coding Component (50)Weve provided you with an implementation .docx
Coding Component (50)Weve provided you with an implementation .docxCoding Component (50)Weve provided you with an implementation .docx
Coding Component (50)Weve provided you with an implementation .docx
 
asgmt01.classpathasgmt01.project asgmt01 .docx
asgmt01.classpathasgmt01.project  asgmt01  .docxasgmt01.classpathasgmt01.project  asgmt01  .docx
asgmt01.classpathasgmt01.project asgmt01 .docx
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
 
Question 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docxQuestion 1 1 pts Skip to question text.As part of a bank account.docx
Question 1 1 pts Skip to question text.As part of a bank account.docx
 
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and Func
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and FuncCSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and Func
CSE 1310 – Spring 21Introduction to ProgrammingLab 4 Arrays and Func
 
Goals1)Be able to work with individual bits in java.2).docx
Goals1)Be able to work with individual bits in java.2).docxGoals1)Be able to work with individual bits in java.2).docx
Goals1)Be able to work with individual bits in java.2).docx
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
 
pyton Notes9
pyton Notes9pyton Notes9
pyton Notes9
 
Lewis jssap3 e_labman02
Lewis jssap3 e_labman02Lewis jssap3 e_labman02
Lewis jssap3 e_labman02
 
Write codeforhumans
Write codeforhumansWrite codeforhumans
Write codeforhumans
 
Quality Assurance 2: Searching for Bugs
Quality Assurance 2: Searching for BugsQuality Assurance 2: Searching for Bugs
Quality Assurance 2: Searching for Bugs
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Decision Tree Lab Performance Analysis

  • 1. CS 348: Introduction to Artificial Intelligence Lab 3: Decision Trees This lab will introduce you to machine learning using decision trees. Decision tree induction has been described in class and is in section 18.3 of the textbook. Decision tree induction is a machine learning approach to approximating f, given a set of examples. An example is a tuple <x1, x2,…, xn, f(x1, x2,…, xn)> consisting of values for the n inputs to the function f and the output of f, given those values. For this lab, you will construct a binary decision tree learner, examine its performance on a variety of binary classification problems, and report the results. The following sections describe the file format for examples, the kind of executable to create, the questions to answer, and what needs to be handed in. INPUT FILE FORMAT The input file format is simple. Input files are text files. The first line of each file contains a list of attribute names. Each attribute name is separated from the following attribute by one or more blank characters (spaces and tabs). Each additional line is an example. Each example line contains n+1 binary values, where n is the number of attributes in the first line. Binary values are encoded as the lower case words “true” and “false.” The ith value in each example line is the value of the ith attribute in that example. The final value in each example line is the categorization of that example. The task for a machine learner is to learn how to categorize examples, using only the values specified for the attributes, so that the machine’s categorization matches the categorization specified in the file. The following is an example of the input file format for a function of three binary attributes. ivy_school good_gpa good_letters true true true true true true true false true false true false false true true false true false true false true true true true false true true true true false false true false false false false true true false false false false false true false false false true THE EXECUTABLE Your program must be written in C, C++, Java, or Lisp. The executable requirements for the varying languages are outlined below. If your program is written in C, C++, or Java: Your executable must run in Windows XP and must be callable from the command line. It must be named dtree.exe (in the case of a native windows executable) or dtree.jar (in the case of a Java byte code executable). The executable must accept the three parameters shown on the below, in the order shown below. dtree.exe <file name> <training set size> <number of trials> The previous line is for a Windows XP executable, compiled from C or C++. Your windows executable must conform to this specification. In this specification, <file name> is the name of the text file containing the examples, <training set size> is an integer specifying the number of examples to include in the training set, and <number of trials> is the number of times a training set will be selected to create a decision tree. If you have chosen to create your program in Java, we require that you create an executable .jar file so that we may call the file using the following syntax.
  • 2. CS 348: Introduction to Artificial Intelligence Lab 3: Decision Trees Java –jar dtree.jar <file name> <training set size> <number of trials> If you do not know how to create a .jar file, there is an excellent tutorial available at the following URL. http://java.sun.com/docs/books/tutorial/jar/ For Java code, your class must be run-able on a Windows machine with Java 1.4.X or later and it should require no change to the CLASSPATH. You can test this by trying your code on other machines with Java and making sure you aren't forgetting any dependencies. If you have questions on this, please email Sara, the TA. If your program is written in Lisp: Your code should be written in a file called dtree.lisp. Within this file, you should include a function called “dtree” which takes three parameters as mentioned above (file name, training set size, and number of trials). Your code will be tested using “Allegro CL,” so you should make sure that it runs in that environment. When run, your executable must perform the following steps. 1) Read in the text file containing the examples. 2) Divide the set of examples into a training set and a testing set by randomly selecting the number of examples for the training set specified in the command-line input <training set size>. Use the remainder for the testing set. 3) Estimate the expected probability of TRUE and FALSE classifications, based on the examples in the training set. 4) Construct a decision tree, based on the training set, using the approach described in section 18.3 of the text. 5) Classify the examples in the testing set using the decision tree built in step 4. 6) Classify the examples in the testing set using the prior probabilities from step 3. 7) Determine the proportion of correct classifications made in steps 5 and 6 by comparing the classifications to the correct answers. 8) Steps 2 through 7 constitute a trial. Repeat steps 2 through 6 until the number of trials is equal to the value specified in the command-line input <number of trials>. 9) Print the results for each trial to an output file called output.txt. The format of output.txt is specified in the following section. OUTPUT FILE FORMAT Each run of your decision tree program should create an output text file that contains the following information: • The input file name • The training set size • The number of trials In additions, you must provide the following information for each trial: • The number of the trial • The set of examples in the training set
  • 3. CS 348: Introduction to Artificial Intelligence Lab 3: Decision Trees • The set of examples in the testing set • The classification returned by the decision tree for each member of the testing set • The classification returned by applying prior probability to each member of the testing set. • The proportion of correct classifications returned by the decision tree • The proportion of correct classifications returned by applying prior probability. If there are multiple trials, then this information should be in the output file for EACH AND EVERY trial. We will not require that a particular layout be used. That said, if we have ANY problem finding the information specified above in your output file, you WILL lose points. It is up to you to make your output format CLEAR. What follows is an example output for a single trial in a format that would be completely acceptable. (Yes, I am aware that trial 1 and trial 2 are identical. This is merely an example of an acceptable output file format. In your actual output, two trials would NOT be identical, because the selection of the training set would be RANDOM, resulting in trials that are different.) Input file name: Examples1.txt Training set size: 7 Number of trials: 2 TRIAL 1: Training set: ivy_school good_gpa good_letters correctClass true true true true false true true true true false false true false false false false true true false false false false false true false false false true Test set: ivy_school good_gpa good_letters correctClass dTreeClass priorProbClass true true true true true true true true true false false true true false true false true true false true true false false false true false true false false true Decision Tree proportion correct: .8 Prior Probability proportion correct: .2 TRIAL 2: Training set: ivy_school good_gpa good_letters correctClass true true true true false true true true true false false true false false false false true true false false false false false true
  • 4. CS 348: Introduction to Artificial Intelligence Lab 3: Decision Trees false false false true Test set: ivy_school good_gpa good_letters correctClass dTreeClass priorProbClass true true true true true true true true true false false true true false true false true true false true true false false false true false true false false true Decision Tree proportion correct: .8 Prior Probability proportion correct: .2
  • 5. CS 348: Introduction to Artificial Intelligence Lab 3: Decision Trees QUESTIONS TO ANSWER Create a MS Word or Adobe Acrobat document that contains the answers to the following questions. NOTE: The answer to each of these questions should take between ½ and one full page. Do not take less than ½ a page. Do not take more than one page. We won’t grade more than one page per question. You may use tables or figures if this helps. When answering these questions, you MUST support your answers with experimental data gathered from running your program. When creating your data, don’t forget about the importance of multiple trials. 1) Compare the performance of the decision tree algorithm to that of simply using prior probability on the MajorityRule.txt example set. 2) How does the size of the training set affect the performance of the decision tree algorithm on the IvyLeague.txt example set? 3) Create a binary decision task that uses at least four attributes. Describe the task. Construct an example set for this task that contains at least 32 examples. Evaluate the performance of the decision tree algorithm on this task. WHAT TO HAND IN You are to submit the following things: 1) The source code for your program: It should be sufficiently modular and well commented. If it is not well commented, don’t be surprised if you lose points. 2) An executable for the program that conforms to the specifications in “The Executable” section. 3) A document containing answers to the lab questions, as specified in the “Questions to Answer” section. 4) The input file you created for question 3. HOW TO HAND IT IN To submit your lab, compress all of the files specified into a .zip file. The file should be named in the following manner, lastname_labnumber.zip. For example, Owsley_3.zip. Submit this .zip file via email to cs348@cs.northwestern.edu. *NOTE* given the issue of .exe files being stripped from email attachments, please rename your “.exe” executable to “dtree.txt” in the .zip file sent to us. Due by the start of class on Wednesday, May 31, 2006.