SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Introduction to Machine
       Learning
                  Lecture 21
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lectures 5-18
Supervised learning
  p               g
        Data classification
                Labeled data
                Build a model that
                covers all the space

Unsupervised learning
        Clustering
                Unlabeled data
                Group similar objects
                G      i il    bj t

        Association rule analysis
                Unlabeled data
                Get the most frequent/important associations

Genetic Fuzzy Systems
                                                               Slide 2
Artificial Intelligence                    Machine Learning
Today’s Agenda


        Introduction
        Reinforcement Learning
        Some examples before going farther




                                                  Slide 3
Artificial Intelligence        Machine Learning
Introduction
        What does reinforcement learning aim at?
                                       g
                Learning from interaction (with environment)

                Goal-directed learning

GOAL
                                       State

                                                                  Environment
                                                                    Environment
                                       Action

                                         Agent
                  agent


                Learning what to do and its effect
                          Trial-and-error search and delayed reward
                                                                                  Slide 4
Artificial Intelligence                        Machine Learning
Introduction

        Learn a reactive behaviors
        Behaviors as a mapping between perceptions and actions
        The
        Th agent has to exploit what it already knows in order to
                  th t        l it h t     l    dk       i    dt
        obtain reward, but it also has to explore in order to make
        better action selections in the future.
        Dilemma − neither exploitation nor exploration can be
           e  a    e t e e p o tat o     o e p o at o ca
        pursued exclusively without failing at the task.




                                                             Slide 5
Artificial Intelligence       Machine Learning
How Can We Learn It?
        Look-up tables
              p                                    Rules
1.                                          3.

         Perception        Action
            State 1       Action 1
            State 2       Action 2
            State 3       Action 3
                …         …



        Neural Net orks
        Ne ral Networks                            Finite t
                                                   Fi it automata
                                                               t
2.                                          4.




                                                                    Slide 6
Artificial Intelligence              Machine Learning
Reinforcement Learning




                                                               Slide 7
Artificial Intelligence   Machine Learning
Reinforcement Learning
                                                                     Reward function
                                    Agent
                                                                           r:S → R
                  State                                 Action
                                                                           or
                                        Reward
                    st                                    at
                                                                          r:S×A→ R
                                          rt

                                Environment

                Agent and environment interact at discrete time steps t=0,1,2, …

                The agent
                     g
                          observes state at step t: st ε S
                          produces action at at step t: at ε A(st)
                          gets resulting reward: rt+1 ε R
                          goes to the next step st+1

                                                                                       Slide 8
Artificial Intelligence                       Machine Learning
Reinforcement Learning
                                                Agent

                                State                                  Action
                                                     Reward
                                  st                                     at
                                                       rt

                                           Environment

                Trace of a trial


      …r                                                                                        …
                               at rt+1          at+1 rt+2             at+2 rt+3          at+3
                    t
                          st             st+1                  st+2               st+3

                Agent goal:
                          Maximize the total amount of reward t receives

                Therefore, that means maximizing not only the immediate reward,
                but cumulative reward in the long run
                                                                                                Slide 9
Artificial Intelligence                         Machine Learning
Example of RL
        Example: Recycling robot
                State
                          charge level of battery

                Actions
                          look for cans, wait for can, go recharge

                Reward
                R    d
                          positive for finding cans, negative for running out of battery




                                                                                    Slide 10
Artificial Intelligence                       Machine Learning
More precisely…
        Restricting to Markovian Decision Process (MDP)
                  g                               (   )
                Finite set of situations
                Finite t f ti
                Fi it set of actions
                Transition probabilities




                Reward probabilities




                This means that
                          The agent needs to have complete information of the world
                          State st+1 only depends on state st and action at
                                                                                Slide 11
Artificial Intelligence                       Machine Learning
Recycling Robot Example

                                           1 − β , −3                      β , R search
                      wait
           1, R
                            wait                                        search

                                                    recharge
                                           1, 0
                                   High
                                     g                            Low


                          search                                    wait


              α ,R                        1 − α ,R
                            search                                               wait
                                                         search
                                                                           1R
                                                                           1,


                                                                                        Slide 12
Artificial Intelligence                     Machine Learning
Recycling Robot Example
S = {high, low}
         g
A (high) = {wait, search}
A (low ) = {wait, search, recharge}




                                                  R search : expected # cans while searching
                                                  R wait : expected # cans while waiting
                                                               R search > R wait




                                                                                   Slide 13
Artificial Intelligence        Machine Learning
Breaking the Markovian Property
        Possible problems that do not satisfy MDP
                 p                          y
                When action and states are not finite
                          Solution: Discretize the set of actions and states
                When transition probabilities do not depend only on the current
                state
                          Possible solution: represent states as structures build up
                          over time from sequences of sensations
                                           q
                          This is POMDP     Partial observable MDP
                          Use POMDP algorithms to solve these problems
                                      g




                                                                                   Slide 14
Artificial Intelligence                      Machine Learning
Elements of Reinforcement Learning




                                                        Slide 15
Artificial Intelligence     Machine Learning
Elements of RL




                Policy: what to do
                Reward: what’s good
                Value: What’s good because it p ed cts reward
                 a ue    at s               t predicts e a d
                Model: What follows what


                                                                Slide 16
Artificial Intelligence               Machine Learning
Components of an RL Agent
        Policy (behavior)
                Mapping from states to actions
                          π*: S       A
        Reward
                Local reward in state t:
                          rt
        Model
                Probability of transition from state s to s’ by executing action a
                                                          s
                          T(s,a,s’)
        And
                The transitions probabilities depend only on these parameters
                This is not known by the agent
                                                                              Slide 17
Artificial Intelligence                    Machine Learning
Components of an RL Agent
        Value functions
                Vπ(s): Long-term reward estimation from state s following policy
                π
                Qπ(s,a): Long-term reward estimation from state s executing
                ac o
                action a and then following po cy π
                         ad e oo          g policy
        A simple example
                A maze




                Note t at t e age t does not know its o
                 ote that the agent         ot o ts own pos t o It ca o y
                                                            position. t can only
                perceive what it has in the surrounding states
                                                                            Slide 18
Artificial Intelligence                Machine Learning
Components of an RL Agent
        Value functions
                Vπ(s): Long-term reward estimation from state s following policy
                π
                Qπ(s,a): Long-term reward estimation from state s executing
                ac o
                action a and then following po cy π
                         ad e oo          g policy
        A simple example
                A maze




                Note t at t e age t does not know its o
                 ote that the agent         ot o ts own pos t o It ca o y
                                                            position. t can only
                perceive what it has in the surrounding states
                                                                            Slide 19
Artificial Intelligence                Machine Learning
Pursuing the goal: Maximize long term reward




                                                                  Slide 20
Artificial Intelligence              Machine Learning
Goals and Rewards
        Ok, but I need to maximize my long term reward. How I
          ,                         y    g
        get the long term reward?
                Long term reward defined in terms of the goal of the agent
                The agent receives the local reward at each time step


        How?
                Intuitive idea: Sum all the rewards obtained so far




                Problem: It can increase heavily in non-ending tasks



                                                                             Slide 21
Artificial Intelligence                Machine Learning
Goals and Rewards
        How can we deal with non-ending tasks?
                                      g
                Weighted addition of local rewards




                The γ parameter (0 < γ < 1) is the discounting factor
                  e pa a ete              ) s t e d scou t g acto

      …r                                                                                         …
                                at rt+1          at+1 rt+2             at+2 rt+3          at+3
                    t
                          st              st+1                  st+2               st+3




                Note t e b as for immediate rewards
                 ote the bias o      ed ate e a ds
                          If you want to avoid it, set γ close to 1
                                                                                                 Slide 22
Artificial Intelligence                          Machine Learning
Some examples




                                                      Slide 23
Artificial Intelligence   Machine Learning
Pole balancing
        Balance the pole
                    p
                The car can move forward
                a d backward
                and bac a d
                Avoid failure:
                          the pole falling beyond
                          a certain critical angle
                          the car hitting the end of the track
                                        g


                Reward
                          -1 upon failure
                          -ak, for k steps before failure
                           a




                                                                  Slide 24
Artificial Intelligence                        Machine Learning
Mountain Car Problem
        Objective
          j
                Get to the top of the hill as
                qu c y
                quickly as poss b e
                           possible


                State d fi iti
                St t definition:
                          Car position and speed


                Actions
                          Forward, reverse, none


                Reward
                          -1 for each step that are not the on the top of the hill
                          -number of steps before reaching the top of the hill
                                                                                     Slide 25
Artificial Intelligence                       Machine Learning
Next Class

        How t l
        H   to learn th policies
                     the  li i




                                                Slide 26
Artificial Intelligence      Machine Learning
Introduction to Machine
       Learning
                  Lecture 21
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

Contenu connexe

Tendances

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebraAle Cignetti
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1Isaac_Schools_5
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objectsmohdazrulazlan
 
cvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedcvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedzukun
 
Ontology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceOntology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceRobert Kost
 
Ontology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessOntology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessthematixpartners
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012Lê Anh
 

Tendances (20)

Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objects
 
cvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedcvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description based
 
Ontology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceOntology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology Conference
 
Ontology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessOntology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuiness
 
Making Intelligence
Making IntelligenceMaking Intelligence
Making Intelligence
 
Lecture04 / scenarios
Lecture04 / scenariosLecture04 / scenarios
Lecture04 / scenarios
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012
 

Plus de Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig
 

Plus de Albert Orriols-Puig (12)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 

Dernier (20)

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 

Lecture21

  • 1. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lectures 5-18 Supervised learning p g Data classification Labeled data Build a model that covers all the space Unsupervised learning Clustering Unlabeled data Group similar objects G i il bj t Association rule analysis Unlabeled data Get the most frequent/important associations Genetic Fuzzy Systems Slide 2 Artificial Intelligence Machine Learning
  • 3. Today’s Agenda Introduction Reinforcement Learning Some examples before going farther Slide 3 Artificial Intelligence Machine Learning
  • 4. Introduction What does reinforcement learning aim at? g Learning from interaction (with environment) Goal-directed learning GOAL State Environment Environment Action Agent agent Learning what to do and its effect Trial-and-error search and delayed reward Slide 4 Artificial Intelligence Machine Learning
  • 5. Introduction Learn a reactive behaviors Behaviors as a mapping between perceptions and actions The Th agent has to exploit what it already knows in order to th t l it h t l dk i dt obtain reward, but it also has to explore in order to make better action selections in the future. Dilemma − neither exploitation nor exploration can be e a e t e e p o tat o o e p o at o ca pursued exclusively without failing at the task. Slide 5 Artificial Intelligence Machine Learning
  • 6. How Can We Learn It? Look-up tables p Rules 1. 3. Perception Action State 1 Action 1 State 2 Action 2 State 3 Action 3 … … Neural Net orks Ne ral Networks Finite t Fi it automata t 2. 4. Slide 6 Artificial Intelligence Machine Learning
  • 7. Reinforcement Learning Slide 7 Artificial Intelligence Machine Learning
  • 8. Reinforcement Learning Reward function Agent r:S → R State Action or Reward st at r:S×A→ R rt Environment Agent and environment interact at discrete time steps t=0,1,2, … The agent g observes state at step t: st ε S produces action at at step t: at ε A(st) gets resulting reward: rt+1 ε R goes to the next step st+1 Slide 8 Artificial Intelligence Machine Learning
  • 9. Reinforcement Learning Agent State Action Reward st at rt Environment Trace of a trial …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Agent goal: Maximize the total amount of reward t receives Therefore, that means maximizing not only the immediate reward, but cumulative reward in the long run Slide 9 Artificial Intelligence Machine Learning
  • 10. Example of RL Example: Recycling robot State charge level of battery Actions look for cans, wait for can, go recharge Reward R d positive for finding cans, negative for running out of battery Slide 10 Artificial Intelligence Machine Learning
  • 11. More precisely… Restricting to Markovian Decision Process (MDP) g ( ) Finite set of situations Finite t f ti Fi it set of actions Transition probabilities Reward probabilities This means that The agent needs to have complete information of the world State st+1 only depends on state st and action at Slide 11 Artificial Intelligence Machine Learning
  • 12. Recycling Robot Example 1 − β , −3 β , R search wait 1, R wait search recharge 1, 0 High g Low search wait α ,R 1 − α ,R search wait search 1R 1, Slide 12 Artificial Intelligence Machine Learning
  • 13. Recycling Robot Example S = {high, low} g A (high) = {wait, search} A (low ) = {wait, search, recharge} R search : expected # cans while searching R wait : expected # cans while waiting R search > R wait Slide 13 Artificial Intelligence Machine Learning
  • 14. Breaking the Markovian Property Possible problems that do not satisfy MDP p y When action and states are not finite Solution: Discretize the set of actions and states When transition probabilities do not depend only on the current state Possible solution: represent states as structures build up over time from sequences of sensations q This is POMDP Partial observable MDP Use POMDP algorithms to solve these problems g Slide 14 Artificial Intelligence Machine Learning
  • 15. Elements of Reinforcement Learning Slide 15 Artificial Intelligence Machine Learning
  • 16. Elements of RL Policy: what to do Reward: what’s good Value: What’s good because it p ed cts reward a ue at s t predicts e a d Model: What follows what Slide 16 Artificial Intelligence Machine Learning
  • 17. Components of an RL Agent Policy (behavior) Mapping from states to actions π*: S A Reward Local reward in state t: rt Model Probability of transition from state s to s’ by executing action a s T(s,a,s’) And The transitions probabilities depend only on these parameters This is not known by the agent Slide 17 Artificial Intelligence Machine Learning
  • 18. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 18 Artificial Intelligence Machine Learning
  • 19. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 19 Artificial Intelligence Machine Learning
  • 20. Pursuing the goal: Maximize long term reward Slide 20 Artificial Intelligence Machine Learning
  • 21. Goals and Rewards Ok, but I need to maximize my long term reward. How I , y g get the long term reward? Long term reward defined in terms of the goal of the agent The agent receives the local reward at each time step How? Intuitive idea: Sum all the rewards obtained so far Problem: It can increase heavily in non-ending tasks Slide 21 Artificial Intelligence Machine Learning
  • 22. Goals and Rewards How can we deal with non-ending tasks? g Weighted addition of local rewards The γ parameter (0 < γ < 1) is the discounting factor e pa a ete ) s t e d scou t g acto …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Note t e b as for immediate rewards ote the bias o ed ate e a ds If you want to avoid it, set γ close to 1 Slide 22 Artificial Intelligence Machine Learning
  • 23. Some examples Slide 23 Artificial Intelligence Machine Learning
  • 24. Pole balancing Balance the pole p The car can move forward a d backward and bac a d Avoid failure: the pole falling beyond a certain critical angle the car hitting the end of the track g Reward -1 upon failure -ak, for k steps before failure a Slide 24 Artificial Intelligence Machine Learning
  • 25. Mountain Car Problem Objective j Get to the top of the hill as qu c y quickly as poss b e possible State d fi iti St t definition: Car position and speed Actions Forward, reverse, none Reward -1 for each step that are not the on the top of the hill -number of steps before reaching the top of the hill Slide 25 Artificial Intelligence Machine Learning
  • 26. Next Class How t l H to learn th policies the li i Slide 26 Artificial Intelligence Machine Learning
  • 27. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull