SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Planning, Acting, and
Learning
Chapter 10
2
Contents
 The Sense/Plan/Act Cycle
 Approximate Search
 Learning Heuristic Functions
 Rewards Instead of Goals
3
Learning Heuristic Functions
 Learning from experiences
 continuous feedback from the environment is one way to
reduce uncertainties and to compensate for an agent’s lack
of knowledge about the effects of its actions.
 Useful information can be extracted from the experience of
interacting the environments.
 Explicit Graphs and Implicit Graphs
4
Learning Heuristic Functions
 Explicit Graphs
 Agent has a good model of the effects of its actions and
knows the costs of moving from any node to its successor
nodes.
 C(ni, nj): the cost of moving from ni to nj.
 (n0, a): the description of the state reached from node n after
taking action a.
 DYNA [Sutton 1990]
 Combination of “learning in the world” with “learning and
planning in the model”.
)],()(ˆ[min)(ˆ
)(
jij
nSn
i nncnhnh
ij


 )),(,()),((ˆminarg anncanha i
a
 
5
Learning Heuristic Functions
 Implicit Graphs
 Impractical to make an explicit graph or table of all the
nodes and their transitions.
 To learn the heuristic function while performing a search
process.
 e.g.) Eight-puzzle
 W(n): the number of tiles in the wrong place, P(n): the sum of
the distances that each tile if from “home”
...)()()(ˆ
21  nPwnWwnh
6
Learning Heuristic Functions
 Learning the weights
 Minimizing the sum of the squared errors between the
training samples and the h’ function given by the
weighted combination.
 Node expansion
 Temporal difference learning [Sutton 1988]: the weight
adjustment depends only on two temporally adjacent
values of a function.
 ),()(ˆmin)(ˆ)1()(ˆ
)(ˆ)],()(ˆ[min)(ˆ)(ˆ
)(
)(
jij
nSn
ii
ijij
nSn
ii
nncnhnhnh
nhnncnhnhnh
ij
ij






 




7
Rewards Instead of Goals
 State-space search
 more theoretical conditions
 It is assumed that the agent had a single, short-term task
that could be described by a goal condition.
 Practical problem
 the task cannot be so simply stated.
 The user expresses his or her satisfaction and dissatisfaction
with task performance by giving the agent positive and
negative rewards.
 The task for the agent can be formalized to maximize the
amount of reward it receives.
8
Rewards Instead of Goals
 Seeking an action policy that maximizes reward
 Policy Improvement by Its Iteration
 : policy function on nodes whose value is the action prescribed
by that policy at that node.
 r(ni, a): the reward received by the agent when it takes an
action a at ni.
 (nj): the value of any special reward given for reaching node nj.
 
  )(,max)(
)()(,)(
)(),(),(
**
ji
a
i
jiii
jjii
nVanrnV
nVnnrnV
nnncanr








9
 Value iteration
 [Barto, Bradtke, and Singh, 1995]
 delayed-reinforcement learning
 learning action policies in settings in which rewards depend on
a sequence of earlier actions
 temporal credit assignment
 credit those state-action pairs most responsible for the reward
 structural credit assignment
 in state space too large for us to store the entire graph, we must
aggregate states with similar V’ values.
 [Kaelbling, Littman, and Moore, 1996]
  )(,maxarg)(* *
ii
a
i nVanrn 
 
 )(ˆ),()(ˆ)1()(ˆ
jiii nVanrnVnV  

Contenu connexe

En vedette

Pest Audit Logo + All Pest Treatments
Pest Audit Logo + All Pest TreatmentsPest Audit Logo + All Pest Treatments
Pest Audit Logo + All Pest Treatments
Mark Hickman
 
Operation Smile Empowers Local Communities
Operation Smile Empowers Local CommunitiesOperation Smile Empowers Local Communities
Operation Smile Empowers Local Communities
George Argyros Jr.
 
Règlement sakifo2013 voyage
Règlement sakifo2013 voyageRèglement sakifo2013 voyage
Règlement sakifo2013 voyage
PartenariatDeezer
 
постановление об оплате
постановление об оплатепостановление об оплате
постановление об оплате
virtualtaganrog
 
предписания (приложение)
предписания (приложение)предписания (приложение)
предписания (приложение)
virtualtaganrog
 

En vedette (15)

Pest Audit Logo + All Pest Treatments
Pest Audit Logo + All Pest TreatmentsPest Audit Logo + All Pest Treatments
Pest Audit Logo + All Pest Treatments
 
El tabaco.
El tabaco.El tabaco.
El tabaco.
 
Scotia london corporate presentation - june 2015
Scotia london   corporate presentation - june 2015Scotia london   corporate presentation - june 2015
Scotia london corporate presentation - june 2015
 
Matthew eardley id
Matthew eardley idMatthew eardley id
Matthew eardley id
 
Operation Smile Empowers Local Communities
Operation Smile Empowers Local CommunitiesOperation Smile Empowers Local Communities
Operation Smile Empowers Local Communities
 
Nova podjela luka na javne luke i privatne luke na hrvatskim unutarnjim vodam...
Nova podjela luka na javne luke i privatne luke na hrvatskim unutarnjim vodam...Nova podjela luka na javne luke i privatne luke na hrvatskim unutarnjim vodam...
Nova podjela luka na javne luke i privatne luke na hrvatskim unutarnjim vodam...
 
Règlement sakifo2013 voyage
Règlement sakifo2013 voyageRèglement sakifo2013 voyage
Règlement sakifo2013 voyage
 
Trabajo asincronica y_sincronica3
Trabajo asincronica y_sincronica3Trabajo asincronica y_sincronica3
Trabajo asincronica y_sincronica3
 
Announcements 053115
Announcements 053115Announcements 053115
Announcements 053115
 
постановление об оплате
постановление об оплатепостановление об оплате
постановление об оплате
 
Campos Eólicos del alumno Josue R
Campos Eólicos del alumno Josue RCampos Eólicos del alumno Josue R
Campos Eólicos del alumno Josue R
 
Opening Advanced Search
Opening Advanced SearchOpening Advanced Search
Opening Advanced Search
 
2013 Spring Newsletter
2013 Spring Newsletter2013 Spring Newsletter
2013 Spring Newsletter
 
предписания (приложение)
предписания (приложение)предписания (приложение)
предписания (приложение)
 
READER PROFILE
READER PROFILEREADER PROFILE
READER PROFILE
 

Similaire à 10 2 sum

reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 

Similaire à 10 2 sum (20)

An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 

Plus de Tianlu Wang

13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculus
Tianlu Wang
 
12 adversal search
12 adversal search12 adversal search
12 adversal search
Tianlu Wang
 
11 alternative search
11 alternative search11 alternative search
11 alternative search
Tianlu Wang
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculus
Tianlu Wang
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidence
Tianlu Wang
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledge
Tianlu Wang
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systems
Tianlu Wang
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based system
Tianlu Wang
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolution
Tianlu Wang
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolution
Tianlu Wang
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
Tianlu Wang
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed search
Tianlu Wang
 

Plus de Tianlu Wang (20)

L7 er2
L7 er2L7 er2
L7 er2
 
L8 design1
L8 design1L8 design1
L8 design1
 
L9 design2
L9 design2L9 design2
L9 design2
 
14 pro resolution
14 pro resolution14 pro resolution
14 pro resolution
 
13 propositional calculus
13 propositional calculus13 propositional calculus
13 propositional calculus
 
12 adversal search
12 adversal search12 adversal search
12 adversal search
 
11 alternative search
11 alternative search11 alternative search
11 alternative search
 
22 planning
22 planning22 planning
22 planning
 
21 situation calculus
21 situation calculus21 situation calculus
21 situation calculus
 
20 bayes learning
20 bayes learning20 bayes learning
20 bayes learning
 
19 uncertain evidence
19 uncertain evidence19 uncertain evidence
19 uncertain evidence
 
18 common knowledge
18 common knowledge18 common knowledge
18 common knowledge
 
17 2 expert systems
17 2 expert systems17 2 expert systems
17 2 expert systems
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based system
 
16 2 predicate resolution
16 2 predicate resolution16 2 predicate resolution
16 2 predicate resolution
 
16 1 predicate resolution
16 1 predicate resolution16 1 predicate resolution
16 1 predicate resolution
 
15 predicate
15 predicate15 predicate
15 predicate
 
09 heuristic search
09 heuristic search09 heuristic search
09 heuristic search
 
08 uninformed search
08 uninformed search08 uninformed search
08 uninformed search
 
07 plan agent
07 plan agent07 plan agent
07 plan agent
 

Dernier

Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
home
 
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | DelhiFULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
SaketCallGirlsCallUs
 
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | DelhiFULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
SaketCallGirlsCallUs
 
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| DelhiDELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
delhimunirka444
 
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
Call Girls in Sakinaka 9892124323, Vashi CAll Girls Call girls Services, Che...
Call Girls in Sakinaka  9892124323, Vashi CAll Girls Call girls Services, Che...Call Girls in Sakinaka  9892124323, Vashi CAll Girls Call girls Services, Che...
Call Girls in Sakinaka 9892124323, Vashi CAll Girls Call girls Services, Che...
Pooja Nehwal
 
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
Business Bay Call Girls || 0529877582 || Call Girls Service in Business Bay Dubai
 
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | DelhiFULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
SaketCallGirlsCallUs
 
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
Business Bay Call Girls || 0529877582 || Call Girls Service in Business Bay Dubai
 
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | DelhiFULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
SaketCallGirlsCallUs
 
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
Sheetaleventcompany
 

Dernier (20)

❤Personal Whatsapp Srinagar Srinagar Call Girls 8617697112 💦✅.
❤Personal Whatsapp Srinagar Srinagar Call Girls 8617697112 💦✅.❤Personal Whatsapp Srinagar Srinagar Call Girls 8617697112 💦✅.
❤Personal Whatsapp Srinagar Srinagar Call Girls 8617697112 💦✅.
 
Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
Verified # 971581275265 # Indian Call Girls In Deira By International City Ca...
 
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | DelhiFULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
FULL NIGHT — 9999894380 Call Girls In Dwarka Mor | Delhi
 
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | DelhiFULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
FULL NIGHT — 9999894380 Call Girls In Delhi Cantt | Delhi
 
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
Sirmaur Call Girls Book Now 8617697112 Top Class Pondicherry Escort Service A...
 
sources of Hindu law kdaenflkjwwfererger
sources of Hindu law kdaenflkjwwferergersources of Hindu law kdaenflkjwwfererger
sources of Hindu law kdaenflkjwwfererger
 
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| DelhiDELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
DELHI NCR —@9711106444 Call Girls In Majnu Ka Tilla (MT)| Delhi
 
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
❤ Sexy Call Girls in Chandigarh 👀📞 90,539,00,678📞 Chandigarh Call Girls Servi...
 
Call Girls in Sakinaka 9892124323, Vashi CAll Girls Call girls Services, Che...
Call Girls in Sakinaka  9892124323, Vashi CAll Girls Call girls Services, Che...Call Girls in Sakinaka  9892124323, Vashi CAll Girls Call girls Services, Che...
Call Girls in Sakinaka 9892124323, Vashi CAll Girls Call girls Services, Che...
 
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
UAE Call Girls # 971526940039 # Independent Call Girls In Dubai # (UAE)
 
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | DelhiFULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
FULL NIGHT — 9999894380 Call Girls In Badarpur | Delhi
 
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableMoradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Moradabad Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
 
Akbar Religious Policy and Sufism comparison.pptx
Akbar Religious Policy and Sufism comparison.pptxAkbar Religious Policy and Sufism comparison.pptx
Akbar Religious Policy and Sufism comparison.pptx
 
GENUINE EscoRtS,Call Girls IN South Delhi Locanto TM''| +91-8377087607
GENUINE EscoRtS,Call Girls IN South Delhi Locanto TM''| +91-8377087607GENUINE EscoRtS,Call Girls IN South Delhi Locanto TM''| +91-8377087607
GENUINE EscoRtS,Call Girls IN South Delhi Locanto TM''| +91-8377087607
 
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
(NEHA) Call Girls Mumbai Call Now 8250077686 Mumbai Escorts 24x7
 
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
Pakistani Bur Dubai Call Girls # +971528960100 # Pakistani Call Girls In Bur ...
 
Completed Event Presentation for Huma 1305
Completed Event Presentation for Huma 1305Completed Event Presentation for Huma 1305
Completed Event Presentation for Huma 1305
 
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | DelhiFULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
FULL NIGHT — 9999894380 Call Girls In Ashok Vihar | Delhi
 
Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
Mayiladuthurai Call Girls 8617697112 Short 3000 Night 8000 Best call girls Se...
 
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
❤️Call girls in Chandigarh ☎️8264406502☎️ Call Girl service in Chandigarh☎️ C...
 

10 2 sum

  • 2. 2 Contents  The Sense/Plan/Act Cycle  Approximate Search  Learning Heuristic Functions  Rewards Instead of Goals
  • 3. 3 Learning Heuristic Functions  Learning from experiences  continuous feedback from the environment is one way to reduce uncertainties and to compensate for an agent’s lack of knowledge about the effects of its actions.  Useful information can be extracted from the experience of interacting the environments.  Explicit Graphs and Implicit Graphs
  • 4. 4 Learning Heuristic Functions  Explicit Graphs  Agent has a good model of the effects of its actions and knows the costs of moving from any node to its successor nodes.  C(ni, nj): the cost of moving from ni to nj.  (n0, a): the description of the state reached from node n after taking action a.  DYNA [Sutton 1990]  Combination of “learning in the world” with “learning and planning in the model”. )],()(ˆ[min)(ˆ )( jij nSn i nncnhnh ij    )),(,()),((ˆminarg anncanha i a  
  • 5. 5 Learning Heuristic Functions  Implicit Graphs  Impractical to make an explicit graph or table of all the nodes and their transitions.  To learn the heuristic function while performing a search process.  e.g.) Eight-puzzle  W(n): the number of tiles in the wrong place, P(n): the sum of the distances that each tile if from “home” ...)()()(ˆ 21  nPwnWwnh
  • 6. 6 Learning Heuristic Functions  Learning the weights  Minimizing the sum of the squared errors between the training samples and the h’ function given by the weighted combination.  Node expansion  Temporal difference learning [Sutton 1988]: the weight adjustment depends only on two temporally adjacent values of a function.  ),()(ˆmin)(ˆ)1()(ˆ )(ˆ)],()(ˆ[min)(ˆ)(ˆ )( )( jij nSn ii ijij nSn ii nncnhnhnh nhnncnhnhnh ij ij            
  • 7. 7 Rewards Instead of Goals  State-space search  more theoretical conditions  It is assumed that the agent had a single, short-term task that could be described by a goal condition.  Practical problem  the task cannot be so simply stated.  The user expresses his or her satisfaction and dissatisfaction with task performance by giving the agent positive and negative rewards.  The task for the agent can be formalized to maximize the amount of reward it receives.
  • 8. 8 Rewards Instead of Goals  Seeking an action policy that maximizes reward  Policy Improvement by Its Iteration  : policy function on nodes whose value is the action prescribed by that policy at that node.  r(ni, a): the reward received by the agent when it takes an action a at ni.  (nj): the value of any special reward given for reaching node nj.     )(,max)( )()(,)( )(),(),( ** ji a i jiii jjii nVanrnV nVnnrnV nnncanr        
  • 9. 9  Value iteration  [Barto, Bradtke, and Singh, 1995]  delayed-reinforcement learning  learning action policies in settings in which rewards depend on a sequence of earlier actions  temporal credit assignment  credit those state-action pairs most responsible for the reward  structural credit assignment  in state space too large for us to store the entire graph, we must aggregate states with similar V’ values.  [Kaelbling, Littman, and Moore, 1996]   )(,maxarg)(* * ii a i nVanrn     )(ˆ),()(ˆ)1()(ˆ jiii nVanrnVnV  