SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Reinforcement Learning ⇒ Dynamic Programming ⇒
Markov Decision Process
Subject: Machine Learning
Dr. Varun Kumar
Subject: Machine Learning Dr. Varun Kumar Lecture 9 1 / 16
Outlines
1 Introduction to Reinforcement Learning
2 Application of Reinforcement Learning
3 Approach for Studying Reinforcement Learning
4 Basics of Dynamic Programming
5 Markov Decision Process:
6 References
Subject: Machine Learning Dr. Varun Kumar Lecture 9 2 / 16
Introduction to reinforcement learning:
Key Feature
1 There is no supervisor for performing the learning process.
2 In stead of supervisor, there is a critic that informs the end outcome.
3 If outcome is meaningful then the whole process is rewarded. On the
other side the whole process is penalized.
4 This learning process is based on reward and penalty.
5 Critic convert the primary reinforcement signal into heuristic
reinforcement signal.
6 Primary reinforcement signal → Signal observed from the
environment.
7 Heuristic reinforcement signal → Higher quality signal.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 3 / 16
Difference between critic and supervisor
Let a complex system has been described as follows
Note
⇒ Critic does not provide the step-by-step solution.
⇒ Critic does not provide any method, training data, suitable learning
system or logical operation for doing the necessary correction, if
output does reaches to the expected value.
⇒ It comment only the end output, whereas supervisor helps in many
ways.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 4 / 16
Block diagram of reinforcement learning
Block diagram
Subject: Machine Learning Dr. Varun Kumar Lecture 9 5 / 16
Aim of reinforcement learning
⇒ To minimize the cost-to-go function.
⇒ Cost-to-go function → Expectation of cumulative cost of action
taken over a sequence of steps instead of immediate cost.
⇒ Learning system : It discover several actions and feed them back to
the environment.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 6 / 16
Application of reinforcement learning
Major application area
♦ Game theory.
♦ Simulation based optimization.
♦ Operational research.
♦ Control theory.
♦ Swarm intelligence.
♦ Multi-agents system.
♦ Information theory.
Note :
⇒ Reinforcement learning is also called as approximate dynamic
programming.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 7 / 16
Approach for studying reinforcement learning
Classical approach: Learning takes place through a process of reward
and penalty with the goal of achieving highly skilled behavior.
Modern approach :
⇒ Based on mathematical framework, such as dynamic programming.
⇒ It decides on the course of action by considering possible future stages
without actually experiencing them.
⇒ It emphasis on planning.
⇒ It is a credit assignment problem.
⇒ Credit or blame is part of interacting decision.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 8 / 16
Dynamic programming
Basics
⇒ How can an agent/decision maker/learning system improves its
long term performance in a stochastic environment ?
⇒ Attaining long term improvised performance without disrupting the
short term performance.
Markov decision process (MDP)
Subject: Machine Learning Dr. Varun Kumar Lecture 9 9 / 16
Markov decision process (MDP):
Key features of MDP
♦ Environment is modeled through probabilistic framework. Some
known probability mass function (pmf) may be the basis for
modeling.
♦ It consists of a finite set of discrete states.
♦ Here states does not contain any past statistics.
♦ Through well defined pmf a set of discrete sample data is created.
♦ For each environmental state, there is a finite set of possible action
that may be taken by agent.
♦ Every time agent takes an action, a certain cost is incurred.
♦ States are observed, actions are taken and costs are incurred at
discrete times.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 10 / 16
Continued–
MDP works on the stochastic environment. It is nothing but a
random process.
Decision action is a time dependent random variable.
Mathematical description:
⇒ Si is the ith state at a sample instant n.
⇒ Sj is the next state at a sample instant n + 1
⇒ pij is known as the transition probability ∀ 1 ≤ i ≤ k and 1 ≤ j ≤ k
pij (Ai ) = P(Xn+1 = Sj |Xn = Si , An = Ai )
⇒ Ai ia ith action taken by an agent at a sample instant n.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 11 / 16
Markov chain rule
Markov chain rule
Markov chain rule is based on the partition theorem.
Statement of partition theorem: Let B1, ..., Bm form a partition of Ω,
then for any event A.
P(A) =
N
i=1
P(A ∩ Bi ) =
N
i=1
P(A|Bi )P(Bi )
Subject: Machine Learning Dr. Varun Kumar Lecture 9 12 / 16
Markov property
1 The basic property of a Markov chain is that only the most recent
point in the trajectory affects what happens next.
P(Xn+1|Xn, Xn−1, ....X0) = P(Xn+1|Xn)
2 Transition matrix or stochastic matrix:
P =





p11 p12 .... p1K
p21 p22 .... p2K
...
... ..........
pK1 pK2 ..... pKK





⇒ Sum of row is equal to unity → j pij = 1
⇒ p11 + p12 + ....p1K = 1 or but p11 + p21 + .... + pK1 = 1
Subject: Machine Learning Dr. Varun Kumar Lecture 9 13 / 16
Continued–
3 n-step transition probability:
Statement: Let X0, X1, X2, ... be a Markov chain with state space
S = 1, 2, ..., N. Recall that the elements of the transition matrix P
are defined as:
pij = P(X1 = j|X0 = i) = P(Xn+1 = j|Xn = i) for any n.
⇒ pij is the probability of making a transition from state i to state j in a
single step.
Q What is the probability of making a transition from state i to state j
over two steps? In another sence, what is P(X2 = j|X0 = i)?
Ans pij
2
Subject: Machine Learning Dr. Varun Kumar Lecture 9 14 / 16
Continued–
Subject: Machine Learning Dr. Varun Kumar Lecture 9 15 / 16
References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
S. Haykin, Neural Networks and Learning Machines, 3/E. Pearson Education
India, 2010.
Subject: Machine Learning Dr. Varun Kumar Lecture 9 16 / 16

Contenu connexe

Tendances

Tendances (20)

Decision tree
Decision treeDecision tree
Decision tree
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Markov chain and its Application
Markov chain and its Application Markov chain and its Application
Markov chain and its Application
 
Optimization problems and algorithms
Optimization problems and  algorithmsOptimization problems and  algorithms
Optimization problems and algorithms
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Back propagation
Back propagationBack propagation
Back propagation
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Bayesian Linear Regression.pptx
Bayesian Linear Regression.pptxBayesian Linear Regression.pptx
Bayesian Linear Regression.pptx
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Probabilistic Reasoning
Probabilistic ReasoningProbabilistic Reasoning
Probabilistic Reasoning
 

Similaire à Lecture 9 Markov decision process

reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 

Similaire à Lecture 9 Markov decision process (20)

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Role of Bellman's Equation in Reinforcement Learning
Role of Bellman's Equation in Reinforcement LearningRole of Bellman's Equation in Reinforcement Learning
Role of Bellman's Equation in Reinforcement Learning
 
Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)Lecture 2 (Machine Learning)
Lecture 2 (Machine Learning)
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic Memory
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Machine learning ( Part 3 )
Machine learning ( Part 3 )Machine learning ( Part 3 )
Machine learning ( Part 3 )
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
Dp
DpDp
Dp
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Particle filter
Particle filterParticle filter
Particle filter
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 
Conservative Safety Monitors of Stochastic Dynamical Systems
Conservative Safety Monitors of Stochastic Dynamical SystemsConservative Safety Monitors of Stochastic Dynamical Systems
Conservative Safety Monitors of Stochastic Dynamical Systems
 

Plus de VARUN KUMAR

Plus de VARUN KUMAR (20)

Distributed rc Model
Distributed rc ModelDistributed rc Model
Distributed rc Model
 
Electrical Wire Model
Electrical Wire ModelElectrical Wire Model
Electrical Wire Model
 
Interconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI DesignInterconnect Parameter in Digital VLSI Design
Interconnect Parameter in Digital VLSI Design
 
Introduction to Digital VLSI Design
Introduction to Digital VLSI DesignIntroduction to Digital VLSI Design
Introduction to Digital VLSI Design
 
Challenges of Massive MIMO System
Challenges of Massive MIMO SystemChallenges of Massive MIMO System
Challenges of Massive MIMO System
 
E-democracy or Digital Democracy
E-democracy or Digital DemocracyE-democracy or Digital Democracy
E-democracy or Digital Democracy
 
Ethics of Parasitic Computing
Ethics of Parasitic ComputingEthics of Parasitic Computing
Ethics of Parasitic Computing
 
Action Lines of Geneva Plan of Action
Action Lines of Geneva Plan of ActionAction Lines of Geneva Plan of Action
Action Lines of Geneva Plan of Action
 
Geneva Plan of Action
Geneva Plan of ActionGeneva Plan of Action
Geneva Plan of Action
 
Fair Use in the Electronic Age
Fair Use in the Electronic AgeFair Use in the Electronic Age
Fair Use in the Electronic Age
 
Software as a Property
Software as a PropertySoftware as a Property
Software as a Property
 
Orthogonal Polynomial
Orthogonal PolynomialOrthogonal Polynomial
Orthogonal Polynomial
 
Patent Protection
Patent ProtectionPatent Protection
Patent Protection
 
Copyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy LawCopyright Vs Patent and Trade Secrecy Law
Copyright Vs Patent and Trade Secrecy Law
 
Property Right and Software
Property Right and SoftwareProperty Right and Software
Property Right and Software
 
Investigating Data Trials
Investigating Data TrialsInvestigating Data Trials
Investigating Data Trials
 
Gaussian Numerical Integration
Gaussian Numerical IntegrationGaussian Numerical Integration
Gaussian Numerical Integration
 
Censorship and Controversy
Censorship and ControversyCensorship and Controversy
Censorship and Controversy
 
Romberg's Integration
Romberg's IntegrationRomberg's Integration
Romberg's Integration
 
Introduction to Censorship
Introduction to Censorship Introduction to Censorship
Introduction to Censorship
 

Dernier

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Dernier (20)

Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 

Lecture 9 Markov decision process

  • 1. Reinforcement Learning ⇒ Dynamic Programming ⇒ Markov Decision Process Subject: Machine Learning Dr. Varun Kumar Subject: Machine Learning Dr. Varun Kumar Lecture 9 1 / 16
  • 2. Outlines 1 Introduction to Reinforcement Learning 2 Application of Reinforcement Learning 3 Approach for Studying Reinforcement Learning 4 Basics of Dynamic Programming 5 Markov Decision Process: 6 References Subject: Machine Learning Dr. Varun Kumar Lecture 9 2 / 16
  • 3. Introduction to reinforcement learning: Key Feature 1 There is no supervisor for performing the learning process. 2 In stead of supervisor, there is a critic that informs the end outcome. 3 If outcome is meaningful then the whole process is rewarded. On the other side the whole process is penalized. 4 This learning process is based on reward and penalty. 5 Critic convert the primary reinforcement signal into heuristic reinforcement signal. 6 Primary reinforcement signal → Signal observed from the environment. 7 Heuristic reinforcement signal → Higher quality signal. Subject: Machine Learning Dr. Varun Kumar Lecture 9 3 / 16
  • 4. Difference between critic and supervisor Let a complex system has been described as follows Note ⇒ Critic does not provide the step-by-step solution. ⇒ Critic does not provide any method, training data, suitable learning system or logical operation for doing the necessary correction, if output does reaches to the expected value. ⇒ It comment only the end output, whereas supervisor helps in many ways. Subject: Machine Learning Dr. Varun Kumar Lecture 9 4 / 16
  • 5. Block diagram of reinforcement learning Block diagram Subject: Machine Learning Dr. Varun Kumar Lecture 9 5 / 16
  • 6. Aim of reinforcement learning ⇒ To minimize the cost-to-go function. ⇒ Cost-to-go function → Expectation of cumulative cost of action taken over a sequence of steps instead of immediate cost. ⇒ Learning system : It discover several actions and feed them back to the environment. Subject: Machine Learning Dr. Varun Kumar Lecture 9 6 / 16
  • 7. Application of reinforcement learning Major application area ♦ Game theory. ♦ Simulation based optimization. ♦ Operational research. ♦ Control theory. ♦ Swarm intelligence. ♦ Multi-agents system. ♦ Information theory. Note : ⇒ Reinforcement learning is also called as approximate dynamic programming. Subject: Machine Learning Dr. Varun Kumar Lecture 9 7 / 16
  • 8. Approach for studying reinforcement learning Classical approach: Learning takes place through a process of reward and penalty with the goal of achieving highly skilled behavior. Modern approach : ⇒ Based on mathematical framework, such as dynamic programming. ⇒ It decides on the course of action by considering possible future stages without actually experiencing them. ⇒ It emphasis on planning. ⇒ It is a credit assignment problem. ⇒ Credit or blame is part of interacting decision. Subject: Machine Learning Dr. Varun Kumar Lecture 9 8 / 16
  • 9. Dynamic programming Basics ⇒ How can an agent/decision maker/learning system improves its long term performance in a stochastic environment ? ⇒ Attaining long term improvised performance without disrupting the short term performance. Markov decision process (MDP) Subject: Machine Learning Dr. Varun Kumar Lecture 9 9 / 16
  • 10. Markov decision process (MDP): Key features of MDP ♦ Environment is modeled through probabilistic framework. Some known probability mass function (pmf) may be the basis for modeling. ♦ It consists of a finite set of discrete states. ♦ Here states does not contain any past statistics. ♦ Through well defined pmf a set of discrete sample data is created. ♦ For each environmental state, there is a finite set of possible action that may be taken by agent. ♦ Every time agent takes an action, a certain cost is incurred. ♦ States are observed, actions are taken and costs are incurred at discrete times. Subject: Machine Learning Dr. Varun Kumar Lecture 9 10 / 16
  • 11. Continued– MDP works on the stochastic environment. It is nothing but a random process. Decision action is a time dependent random variable. Mathematical description: ⇒ Si is the ith state at a sample instant n. ⇒ Sj is the next state at a sample instant n + 1 ⇒ pij is known as the transition probability ∀ 1 ≤ i ≤ k and 1 ≤ j ≤ k pij (Ai ) = P(Xn+1 = Sj |Xn = Si , An = Ai ) ⇒ Ai ia ith action taken by an agent at a sample instant n. Subject: Machine Learning Dr. Varun Kumar Lecture 9 11 / 16
  • 12. Markov chain rule Markov chain rule Markov chain rule is based on the partition theorem. Statement of partition theorem: Let B1, ..., Bm form a partition of Ω, then for any event A. P(A) = N i=1 P(A ∩ Bi ) = N i=1 P(A|Bi )P(Bi ) Subject: Machine Learning Dr. Varun Kumar Lecture 9 12 / 16
  • 13. Markov property 1 The basic property of a Markov chain is that only the most recent point in the trajectory affects what happens next. P(Xn+1|Xn, Xn−1, ....X0) = P(Xn+1|Xn) 2 Transition matrix or stochastic matrix: P =      p11 p12 .... p1K p21 p22 .... p2K ... ... .......... pK1 pK2 ..... pKK      ⇒ Sum of row is equal to unity → j pij = 1 ⇒ p11 + p12 + ....p1K = 1 or but p11 + p21 + .... + pK1 = 1 Subject: Machine Learning Dr. Varun Kumar Lecture 9 13 / 16
  • 14. Continued– 3 n-step transition probability: Statement: Let X0, X1, X2, ... be a Markov chain with state space S = 1, 2, ..., N. Recall that the elements of the transition matrix P are defined as: pij = P(X1 = j|X0 = i) = P(Xn+1 = j|Xn = i) for any n. ⇒ pij is the probability of making a transition from state i to state j in a single step. Q What is the probability of making a transition from state i to state j over two steps? In another sence, what is P(X2 = j|X0 = i)? Ans pij 2 Subject: Machine Learning Dr. Varun Kumar Lecture 9 14 / 16
  • 15. Continued– Subject: Machine Learning Dr. Varun Kumar Lecture 9 15 / 16
  • 16. References E. Alpaydin, Introduction to machine learning. MIT press, 2020. J. Grus, Data science from scratch: first principles with python. O’Reilly Media, 2019. T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University, School of Computer Science, Machine Learning , 2006, vol. 9. S. Haykin, Neural Networks and Learning Machines, 3/E. Pearson Education India, 2010. Subject: Machine Learning Dr. Varun Kumar Lecture 9 16 / 16