SlideShare a Scribd company logo
1 of 26
Learning from Observations

         Chapter 18
        Section 1 – 3
Outline
• Learning agents
• Inductive learning
• Decision tree learning
Learning
• Learning is essential for unknown environments,
  – i.e., when designer lacks omniscience

• Learning is useful as a system construction
  method,
  – i.e., expose the agent to reality rather than trying to
    write it down

• Learning modifies the agent's decision
  mechanisms to improve performance
Learning agents
Learning element
• Design of a learning element is affected by
  – Which components of the performance element are to
    be learned
  – What feedback is available to learn these components
  – What representation is used for the components

• Type of feedback:
  – Supervised learning: correct answers for each
    example
  – Unsupervised learning: correct answers not given
  – Reinforcement learning: occasional rewards
Inductive learning
• Simplest form: learn a function from examples

f is the target function

An example is a pair (x, f(x))

Problem: find a hypothesis h
   such that h ≈ f
   given a training set of examples

(This is a highly simplified model of real learning:
   – Ignores prior knowledge
   – Assumes examples are given)
   –
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:


•
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:


•
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:


•
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:


•
•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:




•
Inductive learning method
• Construct/adjust h to agree with f on training set
• (h is consistent if it agrees with f on all examples)
• E.g., curve fitting:




• Ockham’s razor: prefer the simplest hypothesis
  consistent with data
Learning decision trees
Problem: decide whether to wait for a table at a restaurant,
  based on the following attributes:
   1. Alternate: is there an alternative restaurant nearby?
   2. Bar: is there a comfortable bar area to wait in?
   3. Fri/Sat: is today Friday or Saturday?
   4. Hungry: are we hungry?
   5. Patrons: number of people in the restaurant (None, Some, Full)
   6. Price: price range ($, $$, $$$)
   7. Raining: is it raining outside?
   8. Reservation: have we made a reservation?
   9. Type: kind of restaurant (French, Italian, Thai, Burger)
   10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Attribute-based representations
•   Examples described by attribute values (Boolean, discrete, continuous)
•   E.g., situations where I will/won't wait for a table:




•   Classification of examples is positive (T) or negative (F)

•
Decision trees
• One possible representation for hypotheses
• E.g., here is the “true” tree for deciding whether to wait:
Expressiveness
•   Decision trees can express any function of the input attributes.
•   E.g., for Boolean functions, truth table row → path to leaf:




•   Trivially, there is a consistent decision tree for any training set with one path
    to leaf for each example (unless f nondeterministic in x) but it probably won't
    generalize to new examples

•   Prefer to find more compact decision trees
Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
                                                  n
= number of distinct truth tables with 2n rows = 22

•   E.g., with 6 Boolean attributes, there are
    18,446,744,073,709,551,616 trees
Hypothesis spaces
How many distinct decision trees with n Boolean attributes?
= number of Boolean functions
                                                   n
= number of distinct truth tables with 2n rows = 22

•   E.g., with 6 Boolean attributes, there are
    18,446,744,073,709,551,616 trees

How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)?
• Each attribute can be in (positive), in (negative), or out
         ⇒ 3n distinct conjunctive hypotheses
•   More expressive hypothesis space
     – increases chance that target function can be expressed
     – increases number of hypotheses consistent with training set
         ⇒ may get worse predictions
Decision tree learning
•   Aim: find a small tree consistent with the training examples
•   Idea: (recursively) choose "most significant" attribute as root of
    (sub)tree
Choosing an attribute
• Idea: a good attribute splits the examples into subsets
  that are (ideally) "all positive" or "all negative"




• Patrons? is a better choice
•
Using information theory
• To implement Choose-Attribute in the DTL
  algorithm
• Information Content (Entropy):
       I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi)
• For a training set containing p positive examples
  and n negative examples:
           p   n       p         p   n         n
     I(      ,    )=−     log 2    −    log 2
          p+n p+n     p+n       p+n p+n       p+n
Information gain
• A chosen attribute A divides the training set E into
  subsets E1, … , Ev according to their values for A, where
  A has v distinct values.
                         v
                            p i + ni       pi     ni
      remainder ( A) = ∑             I(        ,        )
                       i =1 p + n       pi + ni pi + ni
• Information Gain (IG) or reduction in entropy from the
  attribute test:
                     p   n
      IG ( A) = I (    ,    ) − remainder ( A)
                    p+n p+n
• Choose the attribute with the largest IG
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit

Consider the attributes Patrons and Type (and others too):

                      2            4        6 2 4
  IG ( Patrons ) = 1 − [I (0,1) + I (1,0) + I ( , )] = .0541 bits
                     12           12       12 6 6
                    2 1 1        2 1 1      4 2 2      4 2 2
  IG (Type) = 1 − [ I ( , ) + I ( , ) + I ( , ) + I ( , )] = 0 bits
                   12 2 2 12 2 2 12 4 4 12 4 4

Patrons has the highest IG of all attributes and so is chosen by the DTL
  algorithm as the root
Example contd.
• Decision tree learned from the 12 examples:




• Substantially simpler than “true” tree---a more complex
  hypothesis isn’t justified by small amount of data
Performance measurement
•    How do we know that h ≈ f ?
    1.       Use theorems of computational/statistical learning theory
    2.       Try h on a new test set of examples
         (
Summary
• Learning needed for unknown environments,
  lazy designers
• Learning agent = performance element +
  learning element
• For supervised learning, the aim is to find a
  simple hypothesis approximately consistent with
  training examples
• Decision tree learning using information gain
• Learning performance = prediction accuracy
  measured on test set

More Related Content

What's hot

Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
butest
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
butest
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
butest
 

What's hot (16)

Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest"Naive Bayes Classifier" @ Papers We Love Bucharest
"Naive Bayes Classifier" @ Papers We Love Bucharest
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Learn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic ModelLearn from Example and Learn Probabilistic Model
Learn from Example and Learn Probabilistic Model
 
Introduction to Bayesian Methods
Introduction to Bayesian MethodsIntroduction to Bayesian Methods
Introduction to Bayesian Methods
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar4th joint Warwick Oxford Statistics Seminar
4th joint Warwick Oxford Statistics Seminar
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Text classification
Text classificationText classification
Text classification
 

Viewers also liked

Inductive method by Anna Mapeth Evangelista
Inductive method by Anna Mapeth EvangelistaInductive method by Anna Mapeth Evangelista
Inductive method by Anna Mapeth Evangelista
BSEPhySci14
 
Picture Word Inductive Model (PWIM)
Picture Word Inductive Model (PWIM)Picture Word Inductive Model (PWIM)
Picture Word Inductive Model (PWIM)
sfredericks
 
Principles of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and ApproachesPrinciples of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and Approaches
justindoliente
 

Viewers also liked (19)

Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learning
 
Inductive teaching
Inductive teachingInductive teaching
Inductive teaching
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Inductive
InductiveInductive
Inductive
 
Inductive Learning Teaching
Inductive Learning TeachingInductive Learning Teaching
Inductive Learning Teaching
 
Eiil web site - slideshare - kolb
Eiil web site - slideshare - kolbEiil web site - slideshare - kolb
Eiil web site - slideshare - kolb
 
Inductive Reasoning and (one of) the Foundations of Machine Learning
Inductive Reasoning and (one of) the Foundations of Machine LearningInductive Reasoning and (one of) the Foundations of Machine Learning
Inductive Reasoning and (one of) the Foundations of Machine Learning
 
Inductive teaching
Inductive teachingInductive teaching
Inductive teaching
 
Inductive teaching
Inductive teachingInductive teaching
Inductive teaching
 
Teaching grammar january 27th 2015 my copie
Teaching grammar  january 27th 2015   my copieTeaching grammar  january 27th 2015   my copie
Teaching grammar january 27th 2015 my copie
 
Inductive method by Anna Mapeth Evangelista
Inductive method by Anna Mapeth EvangelistaInductive method by Anna Mapeth Evangelista
Inductive method by Anna Mapeth Evangelista
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
powerpoint inductive method
powerpoint inductive methodpowerpoint inductive method
powerpoint inductive method
 
Picture Word Inductive Model (PWIM)
Picture Word Inductive Model (PWIM)Picture Word Inductive Model (PWIM)
Picture Word Inductive Model (PWIM)
 
Inductive Method
Inductive MethodInductive Method
Inductive Method
 
Taba model of curriculum development
Taba model of curriculum developmentTaba model of curriculum development
Taba model of curriculum development
 
Deductive and inductive method of teching
Deductive and inductive method of techingDeductive and inductive method of teching
Deductive and inductive method of teching
 
Principles of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and ApproachesPrinciples of Teaching:Different Methods and Approaches
Principles of Teaching:Different Methods and Approaches
 

Similar to M18 learning

Lecture 7
Lecture 7Lecture 7
Lecture 7
butest
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 

Similar to M18 learning (20)

Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Learning
LearningLearning
Learning
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
1_MachineLearning.ppt
1_MachineLearning.ppt1_MachineLearning.ppt
1_MachineLearning.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ML-DecisionTrees.ppt
ML-DecisionTrees.pptML-DecisionTrees.ppt
ML-DecisionTrees.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
ML02.ppt
ML02.pptML02.ppt
ML02.ppt
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
learning.ppt
learning.pptlearning.ppt
learning.ppt
 

Recently uploaded

IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
17thcssbs2
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
中 央社
 

Recently uploaded (20)

Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
Envelope of Discrepancy in Orthodontics: Enhancing Precision in Treatment
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT VẬT LÝ 2024 - TỪ CÁC TRƯỜNG, TRƯ...
 
How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17How to Analyse Profit of a Sales Order in Odoo 17
How to Analyse Profit of a Sales Order in Odoo 17
 
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
Essential Safety precautions during monsoon season
Essential Safety precautions during monsoon seasonEssential Safety precautions during monsoon season
Essential Safety precautions during monsoon season
 
An Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptxAn Overview of the Odoo 17 Discuss App.pptx
An Overview of the Odoo 17 Discuss App.pptx
 
An overview of the various scriptures in Hinduism
An overview of the various scriptures in HinduismAn overview of the various scriptures in Hinduism
An overview of the various scriptures in Hinduism
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 
factors influencing drug absorption-final-2.pptx
factors influencing drug absorption-final-2.pptxfactors influencing drug absorption-final-2.pptx
factors influencing drug absorption-final-2.pptx
 
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdfPost Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
Post Exam Fun(da) Intra UEM General Quiz 2024 - Prelims q&a.pdf
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 

M18 learning

  • 1. Learning from Observations Chapter 18 Section 1 – 3
  • 2. Outline • Learning agents • Inductive learning • Decision tree learning
  • 3. Learning • Learning is essential for unknown environments, – i.e., when designer lacks omniscience • Learning is useful as a system construction method, – i.e., expose the agent to reality rather than trying to write it down • Learning modifies the agent's decision mechanisms to improve performance
  • 5. Learning element • Design of a learning element is affected by – Which components of the performance element are to be learned – What feedback is available to learn these components – What representation is used for the components • Type of feedback: – Supervised learning: correct answers for each example – Unsupervised learning: correct answers not given – Reinforcement learning: occasional rewards
  • 6. Inductive learning • Simplest form: learn a function from examples f is the target function An example is a pair (x, f(x)) Problem: find a hypothesis h such that h ≈ f given a training set of examples (This is a highly simplified model of real learning: – Ignores prior knowledge – Assumes examples are given) –
  • 7. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • •
  • 8. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • •
  • 9. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • •
  • 10. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • •
  • 11. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: •
  • 12. Inductive learning method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: • Ockham’s razor: prefer the simplest hypothesis consistent with data
  • 13. Learning decision trees Problem: decide whether to wait for a table at a restaurant, based on the following attributes: 1. Alternate: is there an alternative restaurant nearby? 2. Bar: is there a comfortable bar area to wait in? 3. Fri/Sat: is today Friday or Saturday? 4. Hungry: are we hungry? 5. Patrons: number of people in the restaurant (None, Some, Full) 6. Price: price range ($, $$, $$$) 7. Raining: is it raining outside? 8. Reservation: have we made a reservation? 9. Type: kind of restaurant (French, Italian, Thai, Burger) 10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
  • 14. Attribute-based representations • Examples described by attribute values (Boolean, discrete, continuous) • E.g., situations where I will/won't wait for a table: • Classification of examples is positive (T) or negative (F) •
  • 15. Decision trees • One possible representation for hypotheses • E.g., here is the “true” tree for deciding whether to wait:
  • 16. Expressiveness • Decision trees can express any function of the input attributes. • E.g., for Boolean functions, truth table row → path to leaf: • Trivially, there is a consistent decision tree for any training set with one path to leaf for each example (unless f nondeterministic in x) but it probably won't generalize to new examples • Prefer to find more compact decision trees
  • 17. Hypothesis spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22 • E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees
  • 18. Hypothesis spaces How many distinct decision trees with n Boolean attributes? = number of Boolean functions n = number of distinct truth tables with 2n rows = 22 • E.g., with 6 Boolean attributes, there are 18,446,744,073,709,551,616 trees How many purely conjunctive hypotheses (e.g., Hungry ∧ ¬Rain)? • Each attribute can be in (positive), in (negative), or out ⇒ 3n distinct conjunctive hypotheses • More expressive hypothesis space – increases chance that target function can be expressed – increases number of hypotheses consistent with training set ⇒ may get worse predictions
  • 19. Decision tree learning • Aim: find a small tree consistent with the training examples • Idea: (recursively) choose "most significant" attribute as root of (sub)tree
  • 20. Choosing an attribute • Idea: a good attribute splits the examples into subsets that are (ideally) "all positive" or "all negative" • Patrons? is a better choice •
  • 21. Using information theory • To implement Choose-Attribute in the DTL algorithm • Information Content (Entropy): I(P(v1), … , P(vn)) = Σi=1 -P(vi) log2 P(vi) • For a training set containing p positive examples and n negative examples: p n p p n n I( , )=− log 2 − log 2 p+n p+n p+n p+n p+n p+n
  • 22. Information gain • A chosen attribute A divides the training set E into subsets E1, … , Ev according to their values for A, where A has v distinct values. v p i + ni pi ni remainder ( A) = ∑ I( , ) i =1 p + n pi + ni pi + ni • Information Gain (IG) or reduction in entropy from the attribute test: p n IG ( A) = I ( , ) − remainder ( A) p+n p+n • Choose the attribute with the largest IG
  • 23. Information gain For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): 2 4 6 2 4 IG ( Patrons ) = 1 − [I (0,1) + I (1,0) + I ( , )] = .0541 bits 12 12 12 6 6 2 1 1 2 1 1 4 2 2 4 2 2 IG (Type) = 1 − [ I ( , ) + I ( , ) + I ( , ) + I ( , )] = 0 bits 12 2 2 12 2 2 12 4 4 12 4 4 Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root
  • 24. Example contd. • Decision tree learned from the 12 examples: • Substantially simpler than “true” tree---a more complex hypothesis isn’t justified by small amount of data
  • 25. Performance measurement • How do we know that h ≈ f ? 1. Use theorems of computational/statistical learning theory 2. Try h on a new test set of examples (
  • 26. Summary • Learning needed for unknown environments, lazy designers • Learning agent = performance element + learning element • For supervised learning, the aim is to find a simple hypothesis approximately consistent with training examples • Decision tree learning using information gain • Learning performance = prediction accuracy measured on test set