SlideShare une entreprise Scribd logo
1  sur  86
Télécharger pour lire hors ligne
Lecture No. 2

       Ravi Gupta
 AU-KBC Research Centre,
MIT Campus, Anna University




                              Date: 8.3.2008
Today’s Agenda

•   Recap (FIND-S Algorithm)
•   Version Space
•   Candidate-Elimination Algorithm
•   Decision Tree
•   ID3 Algorithm
•   Entropy
Concept Learning as Search

Concept learning can be viewed as the task of searching through
a large space of hypothesis implicitly defined by the hypothesis
representation.


The goal of the concept learning search is to find the hypothesis
that best fits the training examples.
General-to-Specific Learning
                                                     Every day Tom his
                                                       enjoy i.e., Only
                                                     positive examples.


  Most General Hypothesis: h = <?, ?, ?, ?, ?, ?>




 Most Specific Hypothesis: h = < Ø, Ø, Ø, Ø, Ø, Ø>
General-to-Specific Learning




             h2 is more general than h1

     h2 imposes fewer constraints on the instance than h1
Definition

Given hypotheses hj and hk, hj is more_general_than_or_equal_to
hk if and only if any instance that satisfies hk also satisfies hj.




We can also say that hj is more_specific_than hk when hk is
more_general_than hj.
FIND-S: Finding a Maximally
    Specific Hypothesis
Step 1: FIND-S




h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
Step 2: FIND-S




                          h0 = <Ø, Ø, Ø, Ø, Ø, Ø>

                     a1      a2    a3   a4   a5     a6

              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Iteration 1

              h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
h1 = <Sunny, Warm, Normal, Strong, Warm, Same>

Iteration 2
                        x2 = <Sunny, Warm, High, Strong, Warm, Same>




              h2 = <Sunny, Warm, ?, Strong, Warm, Same>
Iteration 3   Ignore   h3 = <Sunny, Warm, ?, Strong, Warm, Same>
h3 = < Sunny, Warm, ?, Strong, Warm, Same >


Iteration 4
                        x4 = < Sunny, Warm, High, Strong, Cool, Change >


Step 3

Output        h4 = <Sunny, Warm, ?, Strong, ?, ?>
Unanswered Questions by FIND-S

• Has the learner converged to the correct target
  concept?

• Why prefer the most specific hypothesis?

• What if the training examples consistent?
Version Space

The set of all valid hypotheses provided by an
algorithm is called version space (VS) with respect
to the hypothesis space H and the given example set
D.
Candidate-Elimination Algorithm

  The Candidate-Elimination algorithm finds all describable hypotheses
  that are consistent with the observed training examples




  Hypothesis is derived from examples regardless of whether x is
  positive or negative example
Candidate-Elimination Algorithm




    Earlier
(i.e., FIND-S)
      Def.
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space
                       Examples
Hypothesis Space


                   .
                                  Version Space
                   .
                   .
                   .
                   .
                                       VSH,D
                   .
       H



                         D
LIST-THEN-ELIMINATE Algorithm
    to Obtain Version Space

 • In principle, the LIST-THEN-ELIMINATE algorithm can be
 applied whenever the hypothesis space H is finite.

 • It is guaranteed to output all hypotheses consistent with the
 training data.

 • Unfortunately, it requires exhaustively enumerating all
 hypotheses in H-an unrealistic requirement for all but the most
 trivial hypothesis spaces.
Candidate-Elimination Algorithm

  • The CANDIDATE-ELIMINATION algorithm works on the same
  principle as the above LIST-THEN-ELIMINATE algorithm.

  • It employs a much more compact representation of the version
  space.

  • In this the version space is represented by its most general and
  least general members (Specific).

  • These members form general and specific boundary sets that delimit
  the version space within the partially ordered hypothesis space.
Least General
  (Specific)




Most General
Candidate-Elimination Algorithm
Example




                 G0 ← {<?, ?, ?, ?, ?, ?>}

Initialization

                 S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}
G0 ← {<?, ?, ?, ?, ?, ?>}

                        S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}




              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Iteration 1
                        G1 ← {<?, ?, ?, ?, ?, ?>}

              S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}




                 x2 = <Sunny, Warm, High, Strong, Warm, Same>
Iteration 2
                         G2 ← {<?, ?, ?, ?, ?, ?>}

              S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}
G2 ← {<?, ?, ?, ?, ?, ?>}

              S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}



                               consistent



                x3 = <Rainy, Cold, High, Strong, Warm, Change>
Iteration 3
                S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >}


       G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}




                             G2 ← {<?, ?, ?, ?, ?, ?>}
S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >}


     G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}




                x4 = <Sunny, Warm, high, Strong, Cool, Change>
Iteration 4
                S4 ← {< Sunny, Warm, ?, Strong, ?, ? >}


                 G4 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}



         G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}
Remarks on Version Spaces and
    Candidate-Elimination

The version space learned by the CANDIDATE-ELIMINATION algorithm
will converge toward the hypothesis that correctly describes the target
concept, provided

 (1) there are no errors in the training examples, and

(2) there is some hypothesis in H that correctly
describes the target concept.
What will Happen if the Training
      Contains errors ?



                             No
G0 ← {<?, ?, ?, ?, ?, ?>}

                        S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}




              x1 = <Sunny, Warm, Normal, Strong, Warm, Same>
Iteration 1
                        G1 ← {<?, ?, ?, ?, ?, ?>}

              S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}




                 x2 = <Sunny, Warm, High, Strong, Warm, Same>
Iteration 2
                         G2 ← {<?, ?, Normal, ?, ?, ?>}

              S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}
G2 ← {<?, ?, Normal, ?, ?, ?>}

              S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}



                            consistent



               x3 = <Rainy, Cold, High, Strong, Warm, Change>
Iteration 3
               S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}

                          G3 ← {<?, ?, Normal, ?, ?, ?>}
S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}

                         G3 ← {<?, ?, Normal, ?, ?, ?>}




              x4 = <Sunny, Warm, high, Strong, Cool, Change>
Iteration 4
                              S4 ← { }
                                                        Empty

                              G4 ← { }




                       G3 ← {<?, ?, Normal, ?, ?, ?>}
What will Happen if Hypothesis
       is not Present ?
Remarks on Version Spaces and
    Candidate-Elimination


The target concept is exactly learned when
the S and G boundary sets converge to a
single, identical, hypothesis.
Remarks on Version Spaces and
    Candidate-Elimination

How Can Partially Learned Concepts Be Used?
  Suppose that no additional training examples are available beyond
  the four in our example. And the learner is now required to classify
  new instances that it has not yet observed.




   The target concept is exactly learned when
   the S and G boundary sets converge to a
   single, identical, hypothesis.
Remarks on Version Spaces and
    Candidate-Elimination
Remarks on Version Spaces and
    Candidate-Elimination



  All six hypotheses satisfied




  All six hypotheses satisfied
Remarks on Version Spaces and
    Candidate-Elimination


  Three hypotheses satisfied
  Three hypotheses not satisfied




  Two hypotheses satisfied
  Four hypotheses not satisfied
Remarks on Version Spaces and
    Candidate-Elimination


                         Yes
                         No
Decision Trees
Decision Trees


• Decision tree learning is a method for approximating
discrete value target functions, in which the learned function
is represented by a decision tree.

• Decision trees can also be represented by if-then-else rule.

• Decision tree learning is one of the most widely used
approach for inductive inference .
Decision Trees




An instance is classified by starting at the root node of the tree, testing the
attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute in the given example. This process
is then repeated for the subtree rooted at the new node.
Decision Trees




<Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>




                       PlayTennis = No
Decision Trees
Edges: Attribute
value
                                    Intermediate
                                    Nodes: Attributes

                                                                             Attribute: A1
                                                                 Attribute                       Attribute
                                                                  value            Attribute
                                                                                                  value
                                                                                    value

                                                           Attribute: A2         Output           Attribute: A3
                                                                                 value
                                                                                          Attribute
                                                   Attribute         Attribute                           Attribute
                                                                                           value
                                                    value             value                               value


                                                        Output         Output          Output            Output
                                                        value          value           value             value




                   Leave node:
                   Output value
Decision Trees

                                                     conjunction
                            disjunction




Decision trees represent a disjunction of conjunctions of constraints
on the attribute values of instances.

Each path from the tree root to a leaf corresponds to a conjunction of
attribute tests, and the tree itself to a disjunction of these
conjunctions.
Decision Trees
Decision Trees (F = A ^ B')
                   F = A ^ B‘
      If (A=True and B = False) then Yes
      else
          No

                                           If then else form
                 A
       False               True


      No
                            B
                  False           True



                     Yes          No
Decision Trees (F = A V (B ^ C))

 If (A=True) then Yes
 else if (B = True and C=True) then Yes              If then else form
       else No


                           A
                  True             False

                Yes
                                        B
                           False            True

                         No                 C
                                    False          True

                                   No              Yes
Decision Trees (F = A XOR B)
            F = (A ^ B') V (A' ^ B)

  If (A=True and B = False) then Yes
                                                        If then else form
  else If (A=False and B = False) then Yes
       else No


                                      A
                        False                   True



                        B                        B
            False                     False            True
                             True


           No                             Yes          No
                              Yes
Decision Trees as If-then-else rule
                                                    conjunction
                             disjunction




   If (Outlook = Sunny AND humidity = Normal) then PlayTennis = Yes
   If (Outlook = Overcast) then PlayTennis = Yes
   If (Outlook = Rain AND Wind = Weak) then PlayTennis = Yes
Problems Suitable for Decision Trees

 • Instances are represented by attribute-value pairs
     Instances are described by a fixed set of attributes (e.g., Temperature) and
     their values (e.g., Hot). The easiest situation for decision tree learning is when
     each attribute takes on a small number of disjoint possible values (e.g., Hot,
     Mild, Cold). However, extensions to the basic algorithm allow handling real-
     valued attributes as well (e.g., representing Temperature numerically).

 • The target function has discrete output values

 • Disjunctive descriptions may be required

 • The training data may contain errors

 • The training data may contain missing attribute values
Basic Decision Tree Learning Algorithm

 • ID3 Algorithm (Quinlan 1986) and it’s
 successors C4.5 and C5.0

 • Employs a top-down
     An instance is classified by starting at the root
     node of the tree, testing the attribute specified
     by this node, then moving down the tree
     branch corresponding to the value of the
     attribute in the given example. This process is
     then repeated for the subtree rooted at the
     new node.

 • Greedy search the space of possible
                                                         http://www.rulequest.com/Personal/
 decision trees.
     The algorithm never backtracks to
     reconsider earlier choices.
ID3 Algorithm
Example
Attributes…




Attributes are Outlook, Temperature, Humidity, Wind
Building Decision Tree
Building Decision Tree

                                           Attribute: A1
                  Attribute value
                                                                      Attribute value
                                                   Attribute
                                                     value



                                           Output value
              Attribute: A2                                                  Attribute: A3


                                                           Attribute value
Attribute value          Attribute value                                            Attribute value




                                                      Output value                 Output value
  Output value                Output value
Building Decision Tree

             Outlook
           Temperature   Which attribute to
                          select ?????
             Humidity
              Wind
Root
node
Which Attribute to Select ??

  • We would like to select the attribute that is most useful for
  classifying examples.

  • What is a good quantitative measure of the worth of an
  attribute?




  ID3 uses this information gain measure to select among the
  candidate attributes at each step while growing the tree.
Information Gain

Information gain is based on information theory concept called Entropy

                                                     “Nothing in life is certain except death,
                                                            taxes and the second law of
                                                          thermodynamics. All three are
                                                           processes in which useful or
                                                       accessible forms of some quantity,
                                                          such as energy or money, are
                                                     transformed into useless, inaccessible
                                                     forms of the same quantity. That is not
                                                     to say that these three processes don’t
                                                        have fringe benefits: taxes pay for
Rudolf Julius Emanuel                                 roads and schools; the second law of
                           Claude Elwood
Clausius (January 2,                                       thermodynamics drives cars,
                           Shannon (April 30,
1822 – August 24, 1888),   1916 – February 24,       computers and metabolism; and death,
was a German physicist     2001), an American          at the very least, opens up tenured
and mathematician and      electrical engineer and               faculty positions”
is considered one of the   mathematician, has
central founders of the    been called quot;the father
                                                       Seth Lloyd, writing in Nature 430,
science of                 of information theoryquot;              971 (26 August 2004).
thermodynamics
Entropy

• In information theory, the Shannon entropy or
information entropy is a measure of the uncertainty
associated with a random variable.

• It quantifies the information contained in a
message, usually in bits or bits/symbol.

• It is the minimum message length necessary to
communicate information.
Why Shannon named his uncertainty
      function quot;entropy“ ?

                                                                                 John von
                                                                                 Neumann




 My greatest concern was what to call it. I thought of calling it 'information,' but the
 word was overly used, so I decided to call it 'uncertainty.' When I discussed it with
 John von Neumann, he had a better idea. Von Neumann told me, 'You should call
 it entropy, for two reasons. In the first place your uncertainty function has
 been used in statistical mechanics under that name, so it already has a name.
 In the second place, and more important, no one really knows what entropy
 really is, so in a debate you will always have the advantage.'
Shannon's mouse


Shannon and his famous
electromechanical mouse
Theseus, named after the Greek
mythology hero of Minotaur and
Labyrinth fame, and which he
tried to teach to come out of the
maze in one of the first
experiments in artificial
intelligence.
Entropy


The information entropy of a discrete random variable X, that can take on
possible values {x1...xn} is



where
   I(X) is the information content or self-information of X, which is itself a
   random variable; and
   p(xi) = Pr(X=xi) is the probability mass function of X.
Entropy in our Context

Given a collection S, containing positive and negative
examples of some target concept, the entropy of S relative to
this boolean classification (yes/no) is




 where       is the proportion of positive examples in S and pӨ, is the
 proportion of negative examples in S. In all calculations involving
 entropy we define 0 log 0 to be 0.
Example




There are 14 examples. 9 positive and 5 negative examples [9+, 5-].

The entropy of S relative to this boolean (yes/no) classification is
Information Gain Measure

 Information gain, is simply the expected reduction in entropy
 caused by partitioning the examples according to this attribute.

 More precisely, the information gain, Gain(S, A) of an attribute A,
 relative to a collection of examples S, is defined as




 where Values(A) is the set of all possible values for attribute A,
 and Sv, is the subset of S for which attribute A has value v, i.e.,
Information Gain Measure



                                                 Entropy of S after
                    Entropy of S
                                                     partition




Gain(S, A) is the expected reduction in entropy caused by knowing the value of
attribute A.

Gain(S, A) is the information provided about the target &action value, given the
value of some other attribute A. The value of Gain(S, A) is the number of bits
saved when encoding the target value of an arbitrary member of S, by knowing
the value of attribute A.
Example




There are 14 examples. 9 positive and 5 negative examples [9+, 5-].

The entropy of S relative to this boolean (yes/no) classification is
Gain (S, Attribute = Wind)
Gain (S,A)
Gain (SSunny,A)




Temperature           Humidity              Wind
(Hot) {0+, 2-)    (High) {0+, 3-}     (Weak) {1+, 2-}
(Mild) {1+, 1-}   (Normal) {2+, 0-}   (Strong) {1+, 1-}
(Cool) {1+, 0-}
Gain (SSunny,A)
                    Entropy(SSunny) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095


                         Entropy(Hot) = 0
Temperature
(Hot) {0+, 2-)           Entropy(Mild) = 1
(Mild) {1+, 1-}          Entropy(Cool) = 0
(Cool) {1+, 0-}          Gain(S1, Temperature) = 0.97095 – 2/5*0 – 2/5*1 – 1/5*0 = 0.57095



    Humidity            Entropy(High) = 0
(High) {0+, 3-}         Entropy(Normal) = 0
(Normal) {2+, 0-}       Gain(S1, Humidity) = 0.97095 – 3/5*0 – 2/5*0 = 0.97095


                         Entropy(Weak) = 0.9183
      Wind
(Weak) {1+, 2-}          Entropy(Normal) = 1.0
(Strong) {1+, 1-}        Gain(S1, Wind) = 0.97095 – 3/5*0.9183 – 2/5*1 = 0.01997
Modified Decision Tree
Gain (SRain,A)




Temperature
                     Humidity               Wind
(Hot) {0+, 0-)
                  (High) {1+, 1-}     (Weak) {3+, 0-}
(Mild) {2+, 1-}
                  (Normal) {2+, 1-}   (Strong) {0+, 2-}
(Cool) {1+, 1-}
Gain (SRain,A)
                    Entropy(SRain) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095


                         Entropy(Hot) = 0
Temperature
(Hot) {0+, 0-)           Entropy(Mild) = 0.1383
(Mild) {2+, 1-}          Entropy(Cool) = 1.0
(Cool) {1+, 1-}          Gain(S1, Temperature) = 0.97095 – 0 – 2/3*0.1383 - 2/5*1 = 0.4922



    Humidity            Entropy(High) = 1.0
(High) {1+, 1-}         Entropy(Normal) = 0.1383
(Normal) {2+, 1-}       Gain(S1, Humidity) = 0.97095 – 2/5*1.0 – 3/5*0.1383 = 0.4922


                         Entropy(Weak) = 0.0
      Wind
(Weak) {3+, 0-}          Entropy(Normal) = 0.0
(Strong) {0+, 2-}        Gain(S1, Humidity) = 0.97095 - 3/5*0 – 2/5*0 = 0.97095
Final Decision Tree
Home work
Home work
Home work
a1
(True) {2+, 1-}
(False) {1+, 2-}

Entropy(a1=True) = -{2/3log(2/3) + 1/3log(1/3)} = 0.9183
Entropy(a1=False) = 0.9183
Gain (S, a1) = 1 – 3/6*0.9183 – 3/6*0.9183 = 0.0817        S {3+, 3-} => Entropy(S) = 1




a2                 Entropy(a2=True) = 1.0
(True) {2+, 2-}    Entropy(a1=False) = 1.0
(False) {1+, 1-}   Gain (S, a1) = 1 – 4/6*1 -2/6*1 = 0.0
Home work

               a1


      True              False




[D1, D2, D3]
                    [D4, D5, D6]
Home work

                             a1


            True                           False



      [D1, D2, D3]
                                      [D4, D5, D6]
          a2
                                           a2
  True               False
                                  True               False

+ (Yes)
               - (No)             - (No)
                                                     + (Yes)
Home work
                            a1


           True                           False


     [D1, D2, D3]
                                    [D4, D5, D6]
          a2
                                          a2
  True              False
                                 True              False

+ (Yes)
               - (No)            - (No)
                                                   + (Yes)


           (a1^a2) V (a1' ^ a2')
Some Insights into Capabilities and
       Limitations of ID3 Algorithm
•   ID3’s algorithm searches complete hypothesis space. [Advantage]

•   ID3 maintain only a single current hypothesis as it searches through
    the space of decision trees. By determining only as single
    hypothesis, ID3 loses the capabilities that follows explicitly
    representing all consistent hypothesis. [Disadvantage]

•   ID3 in its pure form performs no backtracking in its search. Once it
    selects an attribute to test at a particular level in the tree, it never
    backtracks to reconsider this choice. Therefore, it is susceptible to
    the usual risks of hill-climbing search without backtracking:
    converging to locally optimal solutions that are not globally optimal.
    [Disadvantage]
Some Insights into Capabilities and
       Limitations of ID3 Algorithm

•   ID3 uses all training examples at each step in the search to make
    statistically based decisions regarding how to refine its current
    hypothesis. This contrasts with methods that make decisions
    incrementally, based on individual training examples (e.g., FIND-S
    or CANDIDATE-ELIMINATION). One advantage of using statistical
    properties of all the examples (e.g., information gain) is that the
    resulting search is much less sensitive to errors in individual training
    examples. [Advantage]

Contenu connexe

Tendances

Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannswapnac12
 
ELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLShashank Rustagi
 
1.1. the central concepts of automata theory
1.1. the central concepts of automata theory1.1. the central concepts of automata theory
1.1. the central concepts of automata theorySampath Kumar S
 
First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)chauhankapil
 
Fuzzy relations
Fuzzy relationsFuzzy relations
Fuzzy relationsnaugariya
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm pptMayank Jain
 
Syntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address CodeSyntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address Codesanchi29
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning systemswapnac12
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}FellowBuddy.com
 

Tendances (20)

Truth management system
Truth  management systemTruth  management system
Truth management system
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbann
 
PAC Learning
PAC LearningPAC Learning
PAC Learning
 
AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)AI Lecture 7 (uncertainty)
AI Lecture 7 (uncertainty)
 
ELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOLELEMENTS OF TRANSPORT PROTOCOL
ELEMENTS OF TRANSPORT PROTOCOL
 
1.1. the central concepts of automata theory
1.1. the central concepts of automata theory1.1. the central concepts of automata theory
1.1. the central concepts of automata theory
 
First order predicate logic (fopl)
First order predicate logic (fopl)First order predicate logic (fopl)
First order predicate logic (fopl)
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Learning With Complete Data
Learning With Complete DataLearning With Complete Data
Learning With Complete Data
 
Fuzzy relations
Fuzzy relationsFuzzy relations
Fuzzy relations
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm ppt
 
Syntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address CodeSyntax-Directed Translation into Three Address Code
Syntax-Directed Translation into Three Address Code
 
Introdution and designing a learning system
Introdution and designing a learning systemIntrodution and designing a learning system
Introdution and designing a learning system
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}Heuristic Search Techniques {Artificial Intelligence}
Heuristic Search Techniques {Artificial Intelligence}
 
supervised learning
supervised learningsupervised learning
supervised learning
 

En vedette

Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3Srinivasan R
 
Machine learning Lecture 4
Machine learning Lecture 4Machine learning Lecture 4
Machine learning Lecture 4Srinivasan R
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Srinivasan R
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 

En vedette (7)

Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Machine learning Lecture 3
Machine learning Lecture 3Machine learning Lecture 3
Machine learning Lecture 3
 
Candidate elimination example
Candidate elimination exampleCandidate elimination example
Candidate elimination example
 
Machine learning Lecture 4
Machine learning Lecture 4Machine learning Lecture 4
Machine learning Lecture 4
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 

Similaire à Machine learning Lecture 2

Candidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabCandidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabVenkateswaraBabuRavi
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logicYamagata Yoriyuki
 
Supervised_Learning.ppt
Supervised_Learning.pptSupervised_Learning.ppt
Supervised_Learning.pptHari629251
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesEmmanuel Hadoux
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2butest
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideWooSung Choi
 
2-Heuristic Search.ppt
2-Heuristic Search.ppt2-Heuristic Search.ppt
2-Heuristic Search.pptMIT,Imphal
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristAkin Osman Kazakci
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logicYamagata Yoriyuki
 
Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Adrian Aley
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...MITSUNARI Shigeo
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm reviewchidabdu
 

Similaire à Machine learning Lecture 2 (20)

Candidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML LabCandidate elimination algorithm in ML Lab
Candidate elimination algorithm in ML Lab
 
ML02.ppt
ML02.pptML02.ppt
ML02.ppt
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
 
Supervised_Learning.ppt
Supervised_Learning.pptSupervised_Learning.ppt
Supervised_Learning.ppt
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processes
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
An optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slideAn optimal and progressive algorithm for skyline queries slide
An optimal and progressive algorithm for skyline queries slide
 
2-Heuristic Search.ppt
2-Heuristic Search.ppt2-Heuristic Search.ppt
2-Heuristic Search.ppt
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
 
Bounded arithmetic in free logic
Bounded arithmetic in free logicBounded arithmetic in free logic
Bounded arithmetic in free logic
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)Intro to Quant Trading Strategies (Lecture 8 of 10)
Intro to Quant Trading Strategies (Lecture 8 of 10)
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...
 
Hprec6 4
Hprec6 4Hprec6 4
Hprec6 4
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm review
 

Dernier

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Machine learning Lecture 2

  • 1. Lecture No. 2 Ravi Gupta AU-KBC Research Centre, MIT Campus, Anna University Date: 8.3.2008
  • 2. Today’s Agenda • Recap (FIND-S Algorithm) • Version Space • Candidate-Elimination Algorithm • Decision Tree • ID3 Algorithm • Entropy
  • 3. Concept Learning as Search Concept learning can be viewed as the task of searching through a large space of hypothesis implicitly defined by the hypothesis representation. The goal of the concept learning search is to find the hypothesis that best fits the training examples.
  • 4. General-to-Specific Learning Every day Tom his enjoy i.e., Only positive examples. Most General Hypothesis: h = <?, ?, ?, ?, ?, ?> Most Specific Hypothesis: h = < Ø, Ø, Ø, Ø, Ø, Ø>
  • 5. General-to-Specific Learning h2 is more general than h1 h2 imposes fewer constraints on the instance than h1
  • 6. Definition Given hypotheses hj and hk, hj is more_general_than_or_equal_to hk if and only if any instance that satisfies hk also satisfies hj. We can also say that hj is more_specific_than hk when hk is more_general_than hj.
  • 7. FIND-S: Finding a Maximally Specific Hypothesis
  • 8. Step 1: FIND-S h0 = <Ø, Ø, Ø, Ø, Ø, Ø>
  • 9. Step 2: FIND-S h0 = <Ø, Ø, Ø, Ø, Ø, Ø> a1 a2 a3 a4 a5 a6 x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 h1 = <Sunny, Warm, Normal, Strong, Warm, Same>
  • 10. h1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 2 x2 = <Sunny, Warm, High, Strong, Warm, Same> h2 = <Sunny, Warm, ?, Strong, Warm, Same>
  • 11. Iteration 3 Ignore h3 = <Sunny, Warm, ?, Strong, Warm, Same>
  • 12. h3 = < Sunny, Warm, ?, Strong, Warm, Same > Iteration 4 x4 = < Sunny, Warm, High, Strong, Cool, Change > Step 3 Output h4 = <Sunny, Warm, ?, Strong, ?, ?>
  • 13. Unanswered Questions by FIND-S • Has the learner converged to the correct target concept? • Why prefer the most specific hypothesis? • What if the training examples consistent?
  • 14. Version Space The set of all valid hypotheses provided by an algorithm is called version space (VS) with respect to the hypothesis space H and the given example set D.
  • 15. Candidate-Elimination Algorithm The Candidate-Elimination algorithm finds all describable hypotheses that are consistent with the observed training examples Hypothesis is derived from examples regardless of whether x is positive or negative example
  • 16. Candidate-Elimination Algorithm Earlier (i.e., FIND-S) Def.
  • 17. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space
  • 18. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space Examples Hypothesis Space . Version Space . . . . VSH,D . H D
  • 19. LIST-THEN-ELIMINATE Algorithm to Obtain Version Space • In principle, the LIST-THEN-ELIMINATE algorithm can be applied whenever the hypothesis space H is finite. • It is guaranteed to output all hypotheses consistent with the training data. • Unfortunately, it requires exhaustively enumerating all hypotheses in H-an unrealistic requirement for all but the most trivial hypothesis spaces.
  • 20. Candidate-Elimination Algorithm • The CANDIDATE-ELIMINATION algorithm works on the same principle as the above LIST-THEN-ELIMINATE algorithm. • It employs a much more compact representation of the version space. • In this the version space is represented by its most general and least general members (Specific). • These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space.
  • 21. Least General (Specific) Most General
  • 23. Example G0 ← {<?, ?, ?, ?, ?, ?>} Initialization S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >}
  • 24. G0 ← {<?, ?, ?, ?, ?, ?>} S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >} x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 G1 ← {<?, ?, ?, ?, ?, ?>} S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} x2 = <Sunny, Warm, High, Strong, Warm, Same> Iteration 2 G2 ← {<?, ?, ?, ?, ?, ?>} S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >}
  • 25. G2 ← {<?, ?, ?, ?, ?, ?>} S2 ← {< Sunny, Warm, ?, Strong, Warm, Same >} consistent x3 = <Rainy, Cold, High, Strong, Warm, Change> Iteration 3 S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>} G2 ← {<?, ?, ?, ?, ?, ?>}
  • 26. S3 ← {< Sunny, Warm, ?, Strong, Warm, Same >} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>} x4 = <Sunny, Warm, high, Strong, Cool, Change> Iteration 4 S4 ← {< Sunny, Warm, ?, Strong, ?, ? >} G4 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} G3 ← {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, Same>}
  • 27.
  • 28.
  • 29. Remarks on Version Spaces and Candidate-Elimination The version space learned by the CANDIDATE-ELIMINATION algorithm will converge toward the hypothesis that correctly describes the target concept, provided (1) there are no errors in the training examples, and (2) there is some hypothesis in H that correctly describes the target concept.
  • 30. What will Happen if the Training Contains errors ? No
  • 31. G0 ← {<?, ?, ?, ?, ?, ?>} S0 ← {<Ø, Ø, Ø, Ø, Ø, Ø >} x1 = <Sunny, Warm, Normal, Strong, Warm, Same> Iteration 1 G1 ← {<?, ?, ?, ?, ?, ?>} S1 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} x2 = <Sunny, Warm, High, Strong, Warm, Same> Iteration 2 G2 ← {<?, ?, Normal, ?, ?, ?>} S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >}
  • 32. G2 ← {<?, ?, Normal, ?, ?, ?>} S2 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} consistent x3 = <Rainy, Cold, High, Strong, Warm, Change> Iteration 3 S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} G3 ← {<?, ?, Normal, ?, ?, ?>}
  • 33. S3 ← {< Sunny, Warm, Normal, Strong, Warm, Same >} G3 ← {<?, ?, Normal, ?, ?, ?>} x4 = <Sunny, Warm, high, Strong, Cool, Change> Iteration 4 S4 ← { } Empty G4 ← { } G3 ← {<?, ?, Normal, ?, ?, ?>}
  • 34. What will Happen if Hypothesis is not Present ?
  • 35. Remarks on Version Spaces and Candidate-Elimination The target concept is exactly learned when the S and G boundary sets converge to a single, identical, hypothesis.
  • 36. Remarks on Version Spaces and Candidate-Elimination How Can Partially Learned Concepts Be Used? Suppose that no additional training examples are available beyond the four in our example. And the learner is now required to classify new instances that it has not yet observed. The target concept is exactly learned when the S and G boundary sets converge to a single, identical, hypothesis.
  • 37. Remarks on Version Spaces and Candidate-Elimination
  • 38. Remarks on Version Spaces and Candidate-Elimination All six hypotheses satisfied All six hypotheses satisfied
  • 39. Remarks on Version Spaces and Candidate-Elimination Three hypotheses satisfied Three hypotheses not satisfied Two hypotheses satisfied Four hypotheses not satisfied
  • 40. Remarks on Version Spaces and Candidate-Elimination Yes No
  • 42. Decision Trees • Decision tree learning is a method for approximating discrete value target functions, in which the learned function is represented by a decision tree. • Decision trees can also be represented by if-then-else rule. • Decision tree learning is one of the most widely used approach for inductive inference .
  • 43. Decision Trees An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node.
  • 44. Decision Trees <Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong> PlayTennis = No
  • 45. Decision Trees Edges: Attribute value Intermediate Nodes: Attributes Attribute: A1 Attribute Attribute value Attribute value value Attribute: A2 Output Attribute: A3 value Attribute Attribute Attribute Attribute value value value value Output Output Output Output value value value value Leave node: Output value
  • 46. Decision Trees conjunction disjunction Decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances. Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and the tree itself to a disjunction of these conjunctions.
  • 48. Decision Trees (F = A ^ B') F = A ^ B‘ If (A=True and B = False) then Yes else No If then else form A False True No B False True Yes No
  • 49. Decision Trees (F = A V (B ^ C)) If (A=True) then Yes else if (B = True and C=True) then Yes If then else form else No A True False Yes B False True No C False True No Yes
  • 50. Decision Trees (F = A XOR B) F = (A ^ B') V (A' ^ B) If (A=True and B = False) then Yes If then else form else If (A=False and B = False) then Yes else No A False True B B False False True True No Yes No Yes
  • 51. Decision Trees as If-then-else rule conjunction disjunction If (Outlook = Sunny AND humidity = Normal) then PlayTennis = Yes If (Outlook = Overcast) then PlayTennis = Yes If (Outlook = Rain AND Wind = Weak) then PlayTennis = Yes
  • 52. Problems Suitable for Decision Trees • Instances are represented by attribute-value pairs Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible values (e.g., Hot, Mild, Cold). However, extensions to the basic algorithm allow handling real- valued attributes as well (e.g., representing Temperature numerically). • The target function has discrete output values • Disjunctive descriptions may be required • The training data may contain errors • The training data may contain missing attribute values
  • 53. Basic Decision Tree Learning Algorithm • ID3 Algorithm (Quinlan 1986) and it’s successors C4.5 and C5.0 • Employs a top-down An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated for the subtree rooted at the new node. • Greedy search the space of possible http://www.rulequest.com/Personal/ decision trees. The algorithm never backtracks to reconsider earlier choices.
  • 56. Attributes… Attributes are Outlook, Temperature, Humidity, Wind
  • 58. Building Decision Tree Attribute: A1 Attribute value Attribute value Attribute value Output value Attribute: A2 Attribute: A3 Attribute value Attribute value Attribute value Attribute value Output value Output value Output value Output value
  • 59. Building Decision Tree Outlook Temperature Which attribute to select ????? Humidity Wind Root node
  • 60. Which Attribute to Select ?? • We would like to select the attribute that is most useful for classifying examples. • What is a good quantitative measure of the worth of an attribute? ID3 uses this information gain measure to select among the candidate attributes at each step while growing the tree.
  • 61. Information Gain Information gain is based on information theory concept called Entropy “Nothing in life is certain except death, taxes and the second law of thermodynamics. All three are processes in which useful or accessible forms of some quantity, such as energy or money, are transformed into useless, inaccessible forms of the same quantity. That is not to say that these three processes don’t have fringe benefits: taxes pay for Rudolf Julius Emanuel roads and schools; the second law of Claude Elwood Clausius (January 2, thermodynamics drives cars, Shannon (April 30, 1822 – August 24, 1888), 1916 – February 24, computers and metabolism; and death, was a German physicist 2001), an American at the very least, opens up tenured and mathematician and electrical engineer and faculty positions” is considered one of the mathematician, has central founders of the been called quot;the father Seth Lloyd, writing in Nature 430, science of of information theoryquot; 971 (26 August 2004). thermodynamics
  • 62. Entropy • In information theory, the Shannon entropy or information entropy is a measure of the uncertainty associated with a random variable. • It quantifies the information contained in a message, usually in bits or bits/symbol. • It is the minimum message length necessary to communicate information.
  • 63. Why Shannon named his uncertainty function quot;entropy“ ? John von Neumann My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'
  • 64. Shannon's mouse Shannon and his famous electromechanical mouse Theseus, named after the Greek mythology hero of Minotaur and Labyrinth fame, and which he tried to teach to come out of the maze in one of the first experiments in artificial intelligence.
  • 65. Entropy The information entropy of a discrete random variable X, that can take on possible values {x1...xn} is where I(X) is the information content or self-information of X, which is itself a random variable; and p(xi) = Pr(X=xi) is the probability mass function of X.
  • 66. Entropy in our Context Given a collection S, containing positive and negative examples of some target concept, the entropy of S relative to this boolean classification (yes/no) is where is the proportion of positive examples in S and pӨ, is the proportion of negative examples in S. In all calculations involving entropy we define 0 log 0 to be 0.
  • 67. Example There are 14 examples. 9 positive and 5 negative examples [9+, 5-]. The entropy of S relative to this boolean (yes/no) classification is
  • 68. Information Gain Measure Information gain, is simply the expected reduction in entropy caused by partitioning the examples according to this attribute. More precisely, the information gain, Gain(S, A) of an attribute A, relative to a collection of examples S, is defined as where Values(A) is the set of all possible values for attribute A, and Sv, is the subset of S for which attribute A has value v, i.e.,
  • 69. Information Gain Measure Entropy of S after Entropy of S partition Gain(S, A) is the expected reduction in entropy caused by knowing the value of attribute A. Gain(S, A) is the information provided about the target &action value, given the value of some other attribute A. The value of Gain(S, A) is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of attribute A.
  • 70. Example There are 14 examples. 9 positive and 5 negative examples [9+, 5-]. The entropy of S relative to this boolean (yes/no) classification is
  • 73. Gain (SSunny,A) Temperature Humidity Wind (Hot) {0+, 2-) (High) {0+, 3-} (Weak) {1+, 2-} (Mild) {1+, 1-} (Normal) {2+, 0-} (Strong) {1+, 1-} (Cool) {1+, 0-}
  • 74. Gain (SSunny,A) Entropy(SSunny) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095 Entropy(Hot) = 0 Temperature (Hot) {0+, 2-) Entropy(Mild) = 1 (Mild) {1+, 1-} Entropy(Cool) = 0 (Cool) {1+, 0-} Gain(S1, Temperature) = 0.97095 – 2/5*0 – 2/5*1 – 1/5*0 = 0.57095 Humidity Entropy(High) = 0 (High) {0+, 3-} Entropy(Normal) = 0 (Normal) {2+, 0-} Gain(S1, Humidity) = 0.97095 – 3/5*0 – 2/5*0 = 0.97095 Entropy(Weak) = 0.9183 Wind (Weak) {1+, 2-} Entropy(Normal) = 1.0 (Strong) {1+, 1-} Gain(S1, Wind) = 0.97095 – 3/5*0.9183 – 2/5*1 = 0.01997
  • 76. Gain (SRain,A) Temperature Humidity Wind (Hot) {0+, 0-) (High) {1+, 1-} (Weak) {3+, 0-} (Mild) {2+, 1-} (Normal) {2+, 1-} (Strong) {0+, 2-} (Cool) {1+, 1-}
  • 77. Gain (SRain,A) Entropy(SRain) = - { 2/5 log(2/5) + 3/5 log(3/5)} = 0.97095 Entropy(Hot) = 0 Temperature (Hot) {0+, 0-) Entropy(Mild) = 0.1383 (Mild) {2+, 1-} Entropy(Cool) = 1.0 (Cool) {1+, 1-} Gain(S1, Temperature) = 0.97095 – 0 – 2/3*0.1383 - 2/5*1 = 0.4922 Humidity Entropy(High) = 1.0 (High) {1+, 1-} Entropy(Normal) = 0.1383 (Normal) {2+, 1-} Gain(S1, Humidity) = 0.97095 – 2/5*1.0 – 3/5*0.1383 = 0.4922 Entropy(Weak) = 0.0 Wind (Weak) {3+, 0-} Entropy(Normal) = 0.0 (Strong) {0+, 2-} Gain(S1, Humidity) = 0.97095 - 3/5*0 – 2/5*0 = 0.97095
  • 81. Home work a1 (True) {2+, 1-} (False) {1+, 2-} Entropy(a1=True) = -{2/3log(2/3) + 1/3log(1/3)} = 0.9183 Entropy(a1=False) = 0.9183 Gain (S, a1) = 1 – 3/6*0.9183 – 3/6*0.9183 = 0.0817 S {3+, 3-} => Entropy(S) = 1 a2 Entropy(a2=True) = 1.0 (True) {2+, 2-} Entropy(a1=False) = 1.0 (False) {1+, 1-} Gain (S, a1) = 1 – 4/6*1 -2/6*1 = 0.0
  • 82. Home work a1 True False [D1, D2, D3] [D4, D5, D6]
  • 83. Home work a1 True False [D1, D2, D3] [D4, D5, D6] a2 a2 True False True False + (Yes) - (No) - (No) + (Yes)
  • 84. Home work a1 True False [D1, D2, D3] [D4, D5, D6] a2 a2 True False True False + (Yes) - (No) - (No) + (Yes) (a1^a2) V (a1' ^ a2')
  • 85. Some Insights into Capabilities and Limitations of ID3 Algorithm • ID3’s algorithm searches complete hypothesis space. [Advantage] • ID3 maintain only a single current hypothesis as it searches through the space of decision trees. By determining only as single hypothesis, ID3 loses the capabilities that follows explicitly representing all consistent hypothesis. [Disadvantage] • ID3 in its pure form performs no backtracking in its search. Once it selects an attribute to test at a particular level in the tree, it never backtracks to reconsider this choice. Therefore, it is susceptible to the usual risks of hill-climbing search without backtracking: converging to locally optimal solutions that are not globally optimal. [Disadvantage]
  • 86. Some Insights into Capabilities and Limitations of ID3 Algorithm • ID3 uses all training examples at each step in the search to make statistically based decisions regarding how to refine its current hypothesis. This contrasts with methods that make decisions incrementally, based on individual training examples (e.g., FIND-S or CANDIDATE-ELIMINATION). One advantage of using statistical properties of all the examples (e.g., information gain) is that the resulting search is much less sensitive to errors in individual training examples. [Advantage]