SlideShare une entreprise Scribd logo
1  sur  44
Hierarchical
Reinforcement Learning
   Amir massoud Farahmand
   Farahmand@SoloGen.net
Markov Decision
         Problems
• Markov Process: Formulating a wide
  range of dynamical systems
• Finding an optimal solution of an
  objective function
• [Stochastic] Dynamics Programming
• Planning: Known environment
• Learning: Unknown environment
MDP
Reinforcement Learning
           (1)
• Very important Machine Learning
  method
• An approximate online solution of
  MDP
  – Monte Carlo method
  – Stochastic Approximation
  – [Function Approximation]
Reinforcement Learning
          (2)
• Q-Learning and SARSA are among the
  most important solution of RL
Curses of DP
• Curse of Modeling
  – RL solves this problem
• Curse of Dimensionality
  – Approximating Value function
  – Hierarchical methods
Hierarchical RL (1)
• Use some kind of hierarchy in order to …
  – Learn faster
  – Need less values to be updated (smaller
    storage dimension)
  – Incorporate a priori knowledge by designer
  – Increase reusability
  – Have a more meaningful structure than a mere
    Q-table
Hierarchical RL (2)
• Is there any unified meaning of
  hierarchy? NO!
• Different methods:
  –   Temporal abstraction
  –   State abstraction
  –   Behavioral decomposition
  –   …
Hierarchical RL (3)
•   Feudal Q-Learning [Dayan, Hinton]
•   Options [Sutton, Precup, Singh]
•   MaxQ [Dietterich]
•   HAM [Russell, Parr, Andre]
•   HexQ [Hengst]
•   Weakly-Coupled MDP [Bernstein, Dean & Lin, …]
•   Structure Learning in SSA [Farahmand, Nili]
•   …
Feudal Q-Learning
•   Divide each task to a few smaller sub-tasks
•   State abstraction method
•   Different layers of managers
•   Each manager gets orders from its super-manager
    and orders to its sub-managers

                                  Super-Manager


                                  Manager 1       Manager 2


                       Sub-Manager 1      Sub-Manager 2
Feudal Q-Learning
• Principles of Feudal Q-Learning
   – Reward Hiding: Managers must reward sub-managers for
     doing their bidding whether or not this satisfies the commands
     of the super-managers. Sub-managers should just learn to obey
     their managers and leave it up to them to determine what it is
     best to do at the next level up.
   – Information Hiding: Managers only need to know the state
     of the system at the granularity of their own choices of tasks.
     Indeed, allowing some decision making to take place at a
     coarser grain is one of the main goals of the hierarchical
     decomposition. Information is hidden both downwards - sub-
     managers do not know the task the super-manager has set the
     manager - and upwards -a super-manager does not know what
     choices its manager has made to satisfy its command.
Feudal Q-Learning
Feudal Q-Learning
Options: Introduction
• People do decision making at
  different time scales
  – Traveling example
• It is desirable to have a method to
  support this temporally-extended
  actions over different time scales
Options: Concept
• Macro-actions
• Temporal abstraction method of Hierarchical RL
• Options are temporally extended actions which
  each of them is consisted of a set of primitive
  actions
• Example:
   – Primitive actions: walking NSWE
   – Options: go to {door, cornet, table, straight}
      • Options can be Open-loop or Closed-loop
• Semi-Markov Decision Process Theory [Puterman]
Options: Formal Definitions
Options: Rise of SMDP!
• Theorem: MDP + Options = SMDP
Options: Value function
Options:
Bellman-like optimality condition
Options: A simple example
Options: A simple example
Options: A simple example
Interrupting Options
• Option’s policy is followed until it
  terminates.
• It is somehow unnecessary condition
  – You may change your decision in the middle of
    execution of your previous decision.
• Interruption Theorem:        Yes! It is better!
Interrupting Options:
     An example
Options: Other issues
• Intra-option {model, value} learning
• Learning each options
  – Defining sub-goal reward function
MaxQ
• MaxQ Value Function Decomposition
• Somehow related to Feudal Q-
  Learning
• Decomposing Value function in a
  hierarchical structure
MaxQ
MaxQ: Value decomposition
MaxQ: Existence theorem



• Recursive optimal policy.
• There may be many recursive optimal policies with different
  value function.
• Recursive optimal policies are not an optimal policy.
• If H is stationary macro hierarchy for MDP M, then all
  recursively optimal policies w.r.t. have the same value.
MaxQ: Learning



• Theorem: If M is MDP, H is stationary macro, GLIE (Greedy
  in the Limit with Infinite Exploration) policy, common
  convergence conditions (bounded V and C, sum of alpha is …),
  then with Prob. 1, algorithm MaxQ-0 will converge!
MaxQ
• Faster learning: all states updating
  – Similar to “all-goal-updating” of
    Kaelbling
MaxQ
MaxQ: State abstraction
• Advantageous
  – Memory reduction
  – Needed exploration will be reduced
  – Increase reusability as it is not
    dependent on its higher parents
• Is it possible?!
MaxQ: State abstraction
• Exact preservation of value function
• Approximate preservation
MaxQ: State abstraction
• Does it converge?
  – It has not proved formally yet.
• What can we do if we want to use an
  abstraction that violates theorem 3?
  – Reward function decomposition
    • Design a reward function that reinforces
      those responsible parts of the architecture.
MaxQ: Other issues
• Undesired Terminal states
• Non-hierarchical execution (polling
  execution)
  – Better performance
  – Computational intensive
Learning in Subsumption
      Architecture
• Structure learning
  – How should behaviors arranged in the
    architecture?
• Behavior learning
  – How should a single behavior act?
• Structure/Behavior learning
SSA: Purely Parallel Case
              manipulate
              the world


              build maps
  sensors

               explore


            avoid obstacles


               locomote
SSA: Structure learning
        issues
• How should we represent structure?
  – Sufficient (problem space can be
    covered)
  – Tractable (small hypothesis space)
  – Well-defined credit assignment
• How should we assign credits to
  architecture?
SSA: Structure learning
        issues
• Purely parallel structure
  – Is it the most plausible choice
    (regarding SSA-BBS assumptions)?
• Some different representations
  – Beh. learning
  – Beh/Layer learning
  – Order learning
SSA: Behavior learning
        issues
• Reinforcement signal decomposition:
 each Beh. has its own reward function
• Reinforcement signal design:         How should
 we transform our desires into reward function?
  – Reward Shaping
  – Emotional Learning
  – …?

• Hierarchical Credit Assignment
SSA: Structure Learning
       example
•   Suppose we have correct behaviors and want to
    arrange them in an architecture in order to
    maximize a specific behavior
•   Subjective evaluation: We want to lift an object to a
    specific height while its slope does not become too high.
•   Objective evaluation: How should we design it?!
SSA: Structure Learning
       example
SSA: Structure Learning
       example

Contenu connexe

Similaire à Hierarchical Reinforcement Learning Techniques

anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityMOHDNADEEM971008
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppthemalathache
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
 
Lecture 1--Aug 29.ppt11111111111111111111111111111
Lecture 1--Aug 29.ppt11111111111111111111111111111Lecture 1--Aug 29.ppt11111111111111111111111111111
Lecture 1--Aug 29.ppt11111111111111111111111111111hpoulady
 
Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1YasutoTamura1
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentKaty Lee
 
"Enhancing Intelligent Agents with Episoic Memories"
"Enhancing Intelligent Agents with Episoic Memories""Enhancing Intelligent Agents with Episoic Memories"
"Enhancing Intelligent Agents with Episoic Memories"diannepatricia
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptxSabthamiS1
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learningahmad bassiouny
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraWenlei Xie
 

Similaire à Hierarchical Reinforcement Learning Techniques (20)

anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of university
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppt
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Lecture 1--Aug 29.ppt11111111111111111111111111111
Lecture 1--Aug 29.ppt11111111111111111111111111111Lecture 1--Aug 29.ppt11111111111111111111111111111
Lecture 1--Aug 29.ppt11111111111111111111111111111
 
Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
RL_in_10_min.pptx
RL_in_10_min.pptxRL_in_10_min.pptx
RL_in_10_min.pptx
 
ai.pptx
ai.pptxai.pptx
ai.pptx
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
 
"Enhancing Intelligent Agents with Episoic Memories"
"Enhancing Intelligent Agents with Episoic Memories""Enhancing Intelligent Agents with Episoic Memories"
"Enhancing Intelligent Agents with Episoic Memories"
 
Nbvtalkonfeatureselection
NbvtalkonfeatureselectionNbvtalkonfeatureselection
Nbvtalkonfeatureselection
 
big data analytics.pptx
big data analytics.pptxbig data analytics.pptx
big data analytics.pptx
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Ai overview
Ai overviewAi overview
Ai overview
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
 
Iterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data EraIterative Graph Computation in the Big Data Era
Iterative Graph Computation in the Big Data Era
 

Plus de butest

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 

Plus de butest (20)

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 

Hierarchical Reinforcement Learning Techniques

  • 1. Hierarchical Reinforcement Learning Amir massoud Farahmand Farahmand@SoloGen.net
  • 2. Markov Decision Problems • Markov Process: Formulating a wide range of dynamical systems • Finding an optimal solution of an objective function • [Stochastic] Dynamics Programming • Planning: Known environment • Learning: Unknown environment
  • 3. MDP
  • 4. Reinforcement Learning (1) • Very important Machine Learning method • An approximate online solution of MDP – Monte Carlo method – Stochastic Approximation – [Function Approximation]
  • 5. Reinforcement Learning (2) • Q-Learning and SARSA are among the most important solution of RL
  • 6. Curses of DP • Curse of Modeling – RL solves this problem • Curse of Dimensionality – Approximating Value function – Hierarchical methods
  • 7. Hierarchical RL (1) • Use some kind of hierarchy in order to … – Learn faster – Need less values to be updated (smaller storage dimension) – Incorporate a priori knowledge by designer – Increase reusability – Have a more meaningful structure than a mere Q-table
  • 8. Hierarchical RL (2) • Is there any unified meaning of hierarchy? NO! • Different methods: – Temporal abstraction – State abstraction – Behavioral decomposition – …
  • 9. Hierarchical RL (3) • Feudal Q-Learning [Dayan, Hinton] • Options [Sutton, Precup, Singh] • MaxQ [Dietterich] • HAM [Russell, Parr, Andre] • HexQ [Hengst] • Weakly-Coupled MDP [Bernstein, Dean & Lin, …] • Structure Learning in SSA [Farahmand, Nili] • …
  • 10. Feudal Q-Learning • Divide each task to a few smaller sub-tasks • State abstraction method • Different layers of managers • Each manager gets orders from its super-manager and orders to its sub-managers Super-Manager Manager 1 Manager 2 Sub-Manager 1 Sub-Manager 2
  • 11. Feudal Q-Learning • Principles of Feudal Q-Learning – Reward Hiding: Managers must reward sub-managers for doing their bidding whether or not this satisfies the commands of the super-managers. Sub-managers should just learn to obey their managers and leave it up to them to determine what it is best to do at the next level up. – Information Hiding: Managers only need to know the state of the system at the granularity of their own choices of tasks. Indeed, allowing some decision making to take place at a coarser grain is one of the main goals of the hierarchical decomposition. Information is hidden both downwards - sub- managers do not know the task the super-manager has set the manager - and upwards -a super-manager does not know what choices its manager has made to satisfy its command.
  • 14. Options: Introduction • People do decision making at different time scales – Traveling example • It is desirable to have a method to support this temporally-extended actions over different time scales
  • 15. Options: Concept • Macro-actions • Temporal abstraction method of Hierarchical RL • Options are temporally extended actions which each of them is consisted of a set of primitive actions • Example: – Primitive actions: walking NSWE – Options: go to {door, cornet, table, straight} • Options can be Open-loop or Closed-loop • Semi-Markov Decision Process Theory [Puterman]
  • 17. Options: Rise of SMDP! • Theorem: MDP + Options = SMDP
  • 20. Options: A simple example
  • 21. Options: A simple example
  • 22. Options: A simple example
  • 23. Interrupting Options • Option’s policy is followed until it terminates. • It is somehow unnecessary condition – You may change your decision in the middle of execution of your previous decision. • Interruption Theorem: Yes! It is better!
  • 24. Interrupting Options: An example
  • 25. Options: Other issues • Intra-option {model, value} learning • Learning each options – Defining sub-goal reward function
  • 26. MaxQ • MaxQ Value Function Decomposition • Somehow related to Feudal Q- Learning • Decomposing Value function in a hierarchical structure
  • 27. MaxQ
  • 29. MaxQ: Existence theorem • Recursive optimal policy. • There may be many recursive optimal policies with different value function. • Recursive optimal policies are not an optimal policy. • If H is stationary macro hierarchy for MDP M, then all recursively optimal policies w.r.t. have the same value.
  • 30. MaxQ: Learning • Theorem: If M is MDP, H is stationary macro, GLIE (Greedy in the Limit with Infinite Exploration) policy, common convergence conditions (bounded V and C, sum of alpha is …), then with Prob. 1, algorithm MaxQ-0 will converge!
  • 31. MaxQ • Faster learning: all states updating – Similar to “all-goal-updating” of Kaelbling
  • 32. MaxQ
  • 33. MaxQ: State abstraction • Advantageous – Memory reduction – Needed exploration will be reduced – Increase reusability as it is not dependent on its higher parents • Is it possible?!
  • 34. MaxQ: State abstraction • Exact preservation of value function • Approximate preservation
  • 35. MaxQ: State abstraction • Does it converge? – It has not proved formally yet. • What can we do if we want to use an abstraction that violates theorem 3? – Reward function decomposition • Design a reward function that reinforces those responsible parts of the architecture.
  • 36. MaxQ: Other issues • Undesired Terminal states • Non-hierarchical execution (polling execution) – Better performance – Computational intensive
  • 37. Learning in Subsumption Architecture • Structure learning – How should behaviors arranged in the architecture? • Behavior learning – How should a single behavior act? • Structure/Behavior learning
  • 38. SSA: Purely Parallel Case manipulate the world build maps sensors explore avoid obstacles locomote
  • 39. SSA: Structure learning issues • How should we represent structure? – Sufficient (problem space can be covered) – Tractable (small hypothesis space) – Well-defined credit assignment • How should we assign credits to architecture?
  • 40. SSA: Structure learning issues • Purely parallel structure – Is it the most plausible choice (regarding SSA-BBS assumptions)? • Some different representations – Beh. learning – Beh/Layer learning – Order learning
  • 41. SSA: Behavior learning issues • Reinforcement signal decomposition: each Beh. has its own reward function • Reinforcement signal design: How should we transform our desires into reward function? – Reward Shaping – Emotional Learning – …? • Hierarchical Credit Assignment
  • 42. SSA: Structure Learning example • Suppose we have correct behaviors and want to arrange them in an architecture in order to maximize a specific behavior • Subjective evaluation: We want to lift an object to a specific height while its slope does not become too high. • Objective evaluation: How should we design it?!