Marinier Laird Cogsci 2008 Emotionrl Pres

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Introduction Interested in the functional benefits of emotion for a cognitive agent Appraisal theories of emotion PEACTIDM theory of cognitive control Use emotion as a reward signal to a reinforcement learning agent Demonstrates a functional benefit of emotion Provides a theory of the origin of intrinsic reward 2

Outline Background Integration of emotion and cognition Integration of emotion and reinforcement learning Implementation in Soar Learning task Results 3

Appraisal Theories of Emotion A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. Appraisals influence emotion Emotion can then be coped with (via internal or external actions) Situation Goals Appraisals Coping Emotion 4

Appraisals to Emotions (Scherer 2001) 5

Cognitive Control: PEACTIDM (Newell 1990) 6

Unification of PEACTIDM and Appraisal Theories 7 Perceive Raw Perceptual Information Environmental Change Encode Motor Suddenness Unpredictability Goal Relevance Intrinsic Pleasantness Stimulus Relevance Motor Commands Prediction Outcome Probability Attend Decode Causal Agent/Motive Discrepancy Conduciveness Control/Power Stimulus chosen for processing Action Comprehend Intend Current Situation Assessment

Distinction between emotion, mood, and feeling(Marinier & Laird 2007) Emotion: Result of appraisals Is about the current situation Mood: “Average” over recent emotions Provides historical context Feeling: Emotion “+” Mood What agent actually perceives 8

Emotion, mood, and feeling Cognition Active Appraisals Perceived Feeling Emotion Feeling Combination Function Pull Mood Decay 9

Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004) 10 External Environment Environment Actions Sensations Critic “Organism” Internal Environment Actions States Rewards Critic Appraisal Process Agent +/- Feeling Intensity States Rewards Decisions Agent Reward = Intensity * Valence

Extending Soar with Emotion(Marinier & Laird 2007) Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning Appraisal Detector Short-Term Memory Situation, Goals Decision Procedure Visual Imagery Perception Action Body 11

Extending Soar with Emotion(Marinier & Laird 2007) 12 Episodic Semantic Symbolic Long-Term Memories Procedural Semantic Learning Episodic Learning Chunking Reinforcement Learning +/-Intensity Appraisal Detector Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Situation, Goals Feelings Decision Procedure Feelings Appraisals Visual Imagery Emotion .5,.7,0,-.4,.3,… Mood .7,-.2,.8,.3,.6,… Perception Action Knowledge Body Architecture

Learning task: Encoding 14 North Passable: false On path: false Progress: true East Passable: false On path: true Progress: true West Passable: false On path: false Progress: true South Passable: true On path: true Progress: true

Learning task: Encoding & Appraisal 15 North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High East Intrinsic Pleasantness: Low Goal Relevance: High Unpredictability: High West Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low

Learning task: Attending, Comprehending & Appraisal 16 South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …

Learning task: Tasking 18 Optimal Subtasks

What is being learned? When to Attend vs Task If Attending, what to Attend to If Tasking, which subtask to create When to Intend vs. Ignore 19

Results: With and without mood 21

Discussion Agent learns both internal (tasking) and external (movement) actions Emotion allows for more frequent rewards, and thus learns faster than standard RL Mood “fills in the gaps” allowing for even faster learning and less variability 22

Marinier Laird Cogsci 2008 Emotionrl Pres

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (9)

En vedette

En vedette (20)

Similaire à Marinier Laird Cogsci 2008 Emotionrl Pres

Similaire à Marinier Laird Cogsci 2008 Emotionrl Pres (20)

Plus de gueste9cbbf

Plus de gueste9cbbf (7)

Dernier

Dernier (20)

Marinier Laird Cogsci 2008 Emotionrl Pres

Notes de l'éditeur