SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Reinforcement Learning in Economics and Finance
(a modest state-of-the-art)
Arthur Charpentier, Romuald Elie & Carl Remlinger
@freakonometrics freakonometrics arXiv:2003.10014 1 / 18
The authors
Arthur Charpentier, professor, maths dpt UQAM (Montréal, Canada),
previously econ. dpt at Université de Rennes (France).
Works on actuarial modeling, insurance & data science (see freakonometrics)
Romuald Elie, professor, maths dpt Université Gustave Eiffel (Paris, France),
visiting UC Berkeley (U.S.).
Carl Remlinger, PhD student, maths dpt Université Gustave Eiffel (Paris, France).
economics cs | ml | ai
@freakonometrics freakonometrics arXiv:2003.10014 2 / 18
Reinforcement Learning
agent environment
1 action at ∈ A
2 reward rt
3 state st+1 ∈ S
the learner takes an action at ∈ A (while at state st)
the learner obtains a (short-term) reward rt ∈ R
then the state of the world becomes st+1 ∈ S
@freakonometrics freakonometrics arXiv:2003.10014 3 / 18
Reinforcement (Sequential) Learning
st−1 st st+1
at−1 at at+1
transition T(st , at , ·)
action action action
Let T be a transition function S × A × S → [0, 1] (Markov dynamics) where:
P st+1 = s st = s, at = a, at−1, at−2, . . . = T(s, a, s ) (see 3 ).
A policy is an action, decided at some state of the world.
either π : S → A, π(s) ∈ A
or π : S × A → [0, 1], i.e. probability to choose action a ∈ A
Given a policy π, its expected reward, is
V π
(st) = EP
∞
k=0
γk
rt+k st, π where a ∼ π(st, ·)
@freakonometrics freakonometrics arXiv:2003.10014 4 / 18
Machine Learning
Data chunk D = {(x1, y1), · · · , (xn, yn)} on X × Y.
E.g. classification, Y = {0, 1}. With a logistic regression
f (x) = 1 + e−x β −1
∈ [0, 1] =: A
Given a loss : A × Y → R+, define regret as
Rn =
1
n
n
i=1
(f (xi ), yi ) − inf
f ∈F
1
n
n
i=1
(f (xi ), yi )
optimal oracle risk
As proved in Robbins (1952), minimizing regret ←→ maximizing a reward
@freakonometrics freakonometrics arXiv:2003.10014 5 / 18
Online Learning
Consider a dynamic setting, with sequential data (xt, yt)
Define regret for some forecasting rule ft
RT =
1
T
T
t=1
(ft(xt), yt+1) − inf
f ∈F
1
T
T
t=1
(f (xt), yt+1)
Classical model averaging (see online aggregator)
k models providing forecasts tyt+1 = ty1
t+1, · · · , tyk
t+1, define tyω
t+1 = ω tyt+1
RT =
1
T
T
t=1
(y∗
t , yt) − inf
ω∈Ω
1
T
T
t=1
(tyω
t+1, yt+1)
@freakonometrics freakonometrics arXiv:2003.10014 6 / 18
(multi-armed) Bandits
Pulling arm k yields (random) reward Rk, with mean Q(k) = E(Rk). Optimal policy is
a = argmax
a∈{1,...,K}
Q(a) , with return Q = Q(a )
Consider a sequential game, with at+1 = ft(at, rt, · · · , a1, r1)
The regret of a bandit algorithm is thus:
RT (f ) = TQ − E
T
t=1
rt = TQ
oracle
−E
T
t=1
Q(at)
Classical exploration-exploitation tradeoff.
See Rothschild (1974) or Weitzman (1979) for economic applications.
@freakonometrics freakonometrics arXiv:2003.10014 7 / 18
Reinforcement Learning Framework (1)
Consider the (infinite time horizon) discounted return
Gt =
∞
k=0
γk
rt+1+k = rt+1 + γGt+1
where 0 ≤ γ ≤ 1 is the discount factor
To quantify the performance of an action, define the Q-value on S × A:
Qπ
(st, at) = EP Gt st, at, π (1)
In order to maximize the reward, as in bandits, the optimal strategy is characterized by
the optimal policies
π (st) = argmax
a∈A
Q (st, a) , where Q (st, at) = max
π∈Π
Qπ
(st, at)
while V (st) = max
a∈A
Q (st, a) (see Sutton and Barto (1998)).
@freakonometrics freakonometrics arXiv:2003.10014 8 / 18
Reinforcement Learning Framework (2)
Bellman’s equation is here
V π
(st) =
a∈A
π(a|s) r(st, a) + γ
s ∈S
T(st, a, s )V π
(s ) ,
or, with n states of the world, if Tπ
ij =
a∈A
π(a|i)T(i, a, j) and rπ
i =
a∈A
π(a|i)r(i, a)



V π(1)
...
V π(n)


 =



rπ
1
...
rπ
n


 + γ



Tπ
11 . . . Tπ
1n
...
...
...
Tπ
n1 . . . Tπ
nn






V π(1)
...
V π(n)



i.e. V π
= rπ + γTπ
V π
= Tπ(V π
), using Bellman’s operator Tπ
(then use the contraction mapping theorem, see Denardo (1967)).
@freakonometrics freakonometrics arXiv:2003.10014 9 / 18
Inventory problem (Hellwig (1973))
Action at ∈ A = {0, 1, 2, . . . , m} denote the number of ordered items arriving on the
morning of day t, puchased at individual prices p.
States st = S = {0, 1, 2, . . . , m} are the number of items available at the end of the
day (before ordering new items for the next day).
Then, state dynamics is
st+1 = min{(st + at), m} − εt +
where εt is the unpredictable demand, independent and identically distributed variables
(st) is a Markov chain,
T(s, a, s ) = P st+1 = s st = s, at = a = P εt = min{(s + a), m} − s +
The reward function R is such that, on day t, revenue made is
rt = −pat + pεt = −pat + p min{(st + at), m} − st+1 +
= R(st, at, st+1)
where p is the price when items are sold to consumers (and p is the price when items
are purchased).
@freakonometrics freakonometrics arXiv:2003.10014 10 / 18
Econ: Consumption & Income Dynamics
Consider an infinitely living agent, with utility u(ct) when consuming ct ≥ 0 in period t.
The agent receives random income yt at time t, and assume that (yt) is a Markov
process, T(s, s ) = P[yt+1 = s |yt = s].
Let wt denote the wealth of the agent, at time t, so that wt+1 = wt + yt − ct.
Assume that the wealth must be non-negative, so ct ≤ wt + yt. And for convenience,
w0 = 0, as in Lettau and Uhlig (1999). At time t, given state st = (wt, yt), we seek ct
solution of
v(wt, yt) = max
c∈[0,wt +yt ]



u(c) + γ
y
v(wt + yt − c, y ) T(yt, y )



This is a standard recursive consumption model, discussed in Ljungqvist and Sargent
(2018) or Hansen and Sargent (2013)
@freakonometrics freakonometrics arXiv:2003.10014 11 / 18
Econ: Bounded Rationality & Experiments
With adaptative learning, Marcet and Sargent (1989a,b) proved that there was
convergence to a rational expectations equilibrium.
Leimar and McNamara (2019) suggested that adaptive and reinforcement learning leads
to bounded rationality, while Abel (2019) motivates reinforcement learning as a suitable
formalism for studying bounded rational agents, since “at a high level, Reinforcement
Learning unifies learning and decision making into a single, general framework ”.
Thompson (1933) introduced this idea of adaptive treatment assignment.
Weber (1992) proved that this problem can be expressed using multi-armed bandits,
and the optimal solution to this bandit problem is to choose the arm with the to the
highest Gittins index, that can be related to the so-called Thompson sampling strategy,
intensively used for AB Testing and experimental economics, see Chattopadhyay and
Duflo (2004).
@freakonometrics freakonometrics arXiv:2003.10014 12 / 18
Operation Research & Stochastic Games
Maskin and Tirole (1988) introduced the concept of Markov perfect equilibrium,
Brown (1951) suggested that firms could form beliefs about competitors’ choice
probabilities, using some fictitious plays, also called Cournot learning (studied more
deeply in Hopkins (2002)).
Bernheim (1984) and Pearce (1984) added assumptions on firms beliefs, called
rationalizability, under which we can end-up with Nash equilibria.
@freakonometrics freakonometrics arXiv:2003.10014 13 / 18
Finance
Market Micro-Structure - see order book dynamics as aggregation of other traders
actions ( buy or sell orders) - see Vyetrenko and Xu (2019)
Portfolio Allocation - see Li and Hoi (2014)
Risk Management, with realistic market frictions or imperfections.
But finance is related to risk measures...
Discounted return Gt is a random variable, function of st, at and π, that can be
denoted Φπ(st, at). We defined
Qπ
(st, at) = EP Φπ
(st, at) st, at, π
but we can consider another functional of the distribution Φπ(st, at), see Distributional
Reinforcement Learning, Bellemare et al. (2017).
@freakonometrics freakonometrics arXiv:2003.10014 14 / 18
Wrap-Up
Intensive recent literature related to Reinforcement Learning (CS, AI, ML)
Focus on algorithms
(Q-learning – Watkins and Dayan (1992) – TD(λ), deep RL, etc)
Connexions with many applications in economics and finance
dynamic programming, operation research, stochastic games, risk measures, etc.
Several recent extensions
inverse reinforcement learning (Miller (1984), Pakes (1986))
distributional reinforcement learning
see arXiv:2003.10014 for more details, and references
@freakonometrics freakonometrics arXiv:2003.10014 15 / 18
References I
Abel, D. (2019). Concepts in Bounded Rationality: Perspectives from Reinforcement Learning. PhD
thesis, Brown University.
Bellemare, M. G., Dabney, W., and Munos, R. (2017). A distributional perspective on reinforcement
learning. arXiv:1707.06887.
Bernheim, B. D. (1984). Rationalizable strategic behavior. Econometrica, 52(4):1007–1028.
Brown, G. W. (1951). Iterative solutions of games by fictitious play. In Koopmans, T., editor, Activity
Analysis of Production and Allocation, pages 374–376. John Wiley & Sons, Inc.
Chattopadhyay, R. and Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy
experiment in india. Econometrica, 72(5):1409–1443.
Denardo, E. V. (1967). Contraction mappings in the theory underlying dynamic programming. SIAM
Review, 9(2):165–177.
Hansen, L. P. and Sargent, T. J. (2013). Recursive Models of Dynamic Linear Economies. The
Gorman Lectures in Economics. Princeton University Press.
Hellwig, M. F. (1973). Sequential models in economic dynamics. PhD thesis, Massachusetts Institute
of Technology, Department of Economics.
@freakonometrics freakonometrics arXiv:2003.10014 16 / 18
References II
Hopkins, E. (2002). Two competing models of how people learn in games. Econometrica,
70(6):2141–2166.
Leimar, O. and McNamara, J. (2019). Learning leads to bounded rationality and the evolution of
cognitive bias in public goods games. Nature Scientific Reports, 9:16319.
Lettau, M. and Uhlig, H. (1999). Rules of thumb versus dynamic programming. American Economic
Review, 89(1):148–174.
Li, B. and Hoi, S. C. (2014). Online portfolio selection: A survey. ACM Computing Surveys (CSUR),
46(3):1–36.
Ljungqvist, L. and Sargent, T. J. (2018). Recursive Macroeconomic Theory. MIT Press, 4 edition.
Marcet, A. and Sargent, T. J. (1989a). Convergence of least-squares learning in environments with
hidden state variables and private information. Journal of Political Economy, 97(6):1306–1322.
Marcet, A. and Sargent, T. J. (1989b). Convergence of least squares learning mechanisms in
self-referential linear stochastic models. Journal of Economic Theory, 48(2):337 – 368.
Maskin, E. and Tirole, J. (1988). A theory of dynamic oligopoly, I: Overview and quantity competition
with large fixed costs. Econometrica, 56:549–569.
Miller, R. A. (1984). Job matching and occupational choice. Journal of Political Economy,
92(6):1086–1120.
@freakonometrics freakonometrics arXiv:2003.10014 17 / 18
References III
Pakes, A. (1986). Patents as options: Some estimates of the value of holding european patent stocks.
Econometrica, 54(4):755–784.
Pearce, D. G. (1984). Rationalizable strategic behavior and the problem of perfection. Econometrica,
52(4):1029–1050.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American
Mathematical Society, 58(5):527–535.
Rothschild, M. (1974). A two-armed bandit theory of market pricing. Journal of Economic Theory,
9(2):185 – 202.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIP Press.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of
the evidence of two samples. Biometrika, 25(3/4):285–294.
Vyetrenko, S. and Xu, S. (2019). Risk-sensitive compact decision trees for autonomous execution in
presence of simulated market response. arXiv preprint arXiv:1906.02312.
Watkins, C. J. C. H. and Dayan, P. (1992). q-learning. Machine Learning, 8(3):279–292.
Weber, R. (1992). On the gittins index for multiarmed bandits. The Annals of Applied Probability,
2(4):1024–1033.
Weitzman, M. L. (1979). Optimal search for the best alternative. Econometrica, 47(3):641–654.
@freakonometrics freakonometrics arXiv:2003.10014 18 / 18

Contenu connexe

Tendances (20)

Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Side 2019 #4
Side 2019 #4Side 2019 #4
Side 2019 #4
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Side 2019 #3
Side 2019 #3Side 2019 #3
Side 2019 #3
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 
Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2Slides econometrics-2018-graduate-2
Slides econometrics-2018-graduate-2
 
Hands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive ModelingHands-On Algorithms for Predictive Modeling
Hands-On Algorithms for Predictive Modeling
 
Lausanne 2019 #4
Lausanne 2019 #4Lausanne 2019 #4
Lausanne 2019 #4
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
transformations and nonparametric inference
transformations and nonparametric inferencetransformations and nonparametric inference
transformations and nonparametric inference
 
Side 2019 #8
Side 2019 #8Side 2019 #8
Side 2019 #8
 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Slides ub-7
Slides ub-7Slides ub-7
Slides ub-7
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Mutualisation et Segmentation
Mutualisation et SegmentationMutualisation et Segmentation
Mutualisation et Segmentation
 
Sildes buenos aires
Sildes buenos airesSildes buenos aires
Sildes buenos aires
 

Similaire à Reinforcement Learning in Economics and Finance

Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Arthur Charpentier
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Asma Ben Slimene
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Asma Ben Slimene
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Arthur Charpentier
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCADmitrii Ignatov
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsChristian Robert
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 

Similaire à Reinforcement Learning in Economics and Finance (20)

Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Slides erasmus
Slides erasmusSlides erasmus
Slides erasmus
 
Classification
ClassificationClassification
Classification
 
Phd Proposal
Phd ProposalPhd Proposal
Phd Proposal
 
Slides ub-3
Slides ub-3Slides ub-3
Slides ub-3
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission Testing for Extreme Volatility Transmission
Testing for Extreme Volatility Transmission
 
Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...Research internship on optimal stochastic theory with financial application u...
Research internship on optimal stochastic theory with financial application u...
 
Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...Presentation on stochastic control problem with financial applications (Merto...
Presentation on stochastic control problem with financial applications (Merto...
 
Slides ub-2
Slides ub-2Slides ub-2
Slides ub-2
 
smtlecture.7
smtlecture.7smtlecture.7
smtlecture.7
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Varese italie #2
Varese italie #2Varese italie #2
Varese italie #2
 
On the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCAOn the Family of Concept Forming Operators in Polyadic FCA
On the Family of Concept Forming Operators in Polyadic FCA
 
Statistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: ModelsStatistics (1): estimation, Chapter 1: Models
Statistics (1): estimation, Chapter 1: Models
 
Stochastic Processes - part 6
Stochastic Processes - part 6Stochastic Processes - part 6
Stochastic Processes - part 6
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 

Plus de Arthur Charpentier

Plus de Arthur Charpentier (13)

Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
ACT6100 introduction
ACT6100 introductionACT6100 introduction
ACT6100 introduction
 
Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)Family History and Life Insurance (UConn actuarial seminar)
Family History and Life Insurance (UConn actuarial seminar)
 
Control epidemics
Control epidemics Control epidemics
Control epidemics
 
STT5100 Automne 2020, introduction
STT5100 Automne 2020, introductionSTT5100 Automne 2020, introduction
STT5100 Automne 2020, introduction
 
Family History and Life Insurance
Family History and Life InsuranceFamily History and Life Insurance
Family History and Life Insurance
 
Optimal Control and COVID-19
Optimal Control and COVID-19Optimal Control and COVID-19
Optimal Control and COVID-19
 
Slides OICA 2020
Slides OICA 2020Slides OICA 2020
Slides OICA 2020
 
Lausanne 2019 #3
Lausanne 2019 #3Lausanne 2019 #3
Lausanne 2019 #3
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Side 2019 #11
Side 2019 #11Side 2019 #11
Side 2019 #11
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 
Econ. Seminar Uqam
Econ. Seminar UqamEcon. Seminar Uqam
Econ. Seminar Uqam
 

Dernier

VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfGale Pooley
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfGale Pooley
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Basic concepts related to Financial modelling
Basic concepts related to Financial modellingBasic concepts related to Financial modelling
Basic concepts related to Financial modellingbaijup5
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Call Girls in Nagpur High Profile
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptxFinTech Belgium
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptxFinTech Belgium
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfSaviRakhecha1
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...dipikadinghjn ( Why You Choose Us? ) Escorts
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfGale Pooley
 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...ssifa0344
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...dipikadinghjn ( Why You Choose Us? ) Escorts
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfMichael Silva
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
00_Main ppt_MeetupDORA&CyberSecurity.pptx
00_Main ppt_MeetupDORA&CyberSecurity.pptx00_Main ppt_MeetupDORA&CyberSecurity.pptx
00_Main ppt_MeetupDORA&CyberSecurity.pptxFinTech Belgium
 

Dernier (20)

VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
VIP Call Girl in Mira Road 💧 9920725232 ( Call Me ) Get A New Crush Everyday ...
 
The Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdfThe Economic History of the U.S. Lecture 18.pdf
The Economic History of the U.S. Lecture 18.pdf
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
 
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Koregaon Park Call Me 7737669865 Budget Friendly No Advance Booking
 
Basic concepts related to Financial modelling
Basic concepts related to Financial modellingBasic concepts related to Financial modelling
Basic concepts related to Financial modelling
 
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...Top Rated  Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
Top Rated Pune Call Girls Viman Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Sex...
 
03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx03_Emmanuel Ndiaye_Degroof Petercam.pptx
03_Emmanuel Ndiaye_Degroof Petercam.pptx
 
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
05_Annelore Lenoir_Docbyte_MeetupDora&Cybersecurity.pptx
 
Indore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdfIndore Real Estate Market Trends Report.pdf
Indore Real Estate Market Trends Report.pdf
 
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
VIP Independent Call Girls in Andheri 🌹 9920725232 ( Call Me ) Mumbai Escorts...
 
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Shivane  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Shivane 6297143586 Call Hot Indian Gi...
 
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
VIP Independent Call Girls in Bandra West 🌹 9920725232 ( Call Me ) Mumbai Esc...
 
The Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdfThe Economic History of the U.S. Lecture 30.pdf
The Economic History of the U.S. Lecture 30.pdf
 
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Wadgaon Sheri  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Wadgaon Sheri 6297143586 Call Hot Ind...
 
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
TEST BANK For Corporate Finance, 13th Edition By Stephen Ross, Randolph Weste...
 
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
VIP Call Girl Service Andheri West ⚡ 9920725232 What It Takes To Be The Best ...
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 
Stock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdfStock Market Brief Deck (Under Pressure).pdf
Stock Market Brief Deck (Under Pressure).pdf
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
00_Main ppt_MeetupDORA&CyberSecurity.pptx
00_Main ppt_MeetupDORA&CyberSecurity.pptx00_Main ppt_MeetupDORA&CyberSecurity.pptx
00_Main ppt_MeetupDORA&CyberSecurity.pptx
 

Reinforcement Learning in Economics and Finance

  • 1. Reinforcement Learning in Economics and Finance (a modest state-of-the-art) Arthur Charpentier, Romuald Elie & Carl Remlinger @freakonometrics freakonometrics arXiv:2003.10014 1 / 18
  • 2. The authors Arthur Charpentier, professor, maths dpt UQAM (Montréal, Canada), previously econ. dpt at Université de Rennes (France). Works on actuarial modeling, insurance & data science (see freakonometrics) Romuald Elie, professor, maths dpt Université Gustave Eiffel (Paris, France), visiting UC Berkeley (U.S.). Carl Remlinger, PhD student, maths dpt Université Gustave Eiffel (Paris, France). economics cs | ml | ai @freakonometrics freakonometrics arXiv:2003.10014 2 / 18
  • 3. Reinforcement Learning agent environment 1 action at ∈ A 2 reward rt 3 state st+1 ∈ S the learner takes an action at ∈ A (while at state st) the learner obtains a (short-term) reward rt ∈ R then the state of the world becomes st+1 ∈ S @freakonometrics freakonometrics arXiv:2003.10014 3 / 18
  • 4. Reinforcement (Sequential) Learning st−1 st st+1 at−1 at at+1 transition T(st , at , ·) action action action Let T be a transition function S × A × S → [0, 1] (Markov dynamics) where: P st+1 = s st = s, at = a, at−1, at−2, . . . = T(s, a, s ) (see 3 ). A policy is an action, decided at some state of the world. either π : S → A, π(s) ∈ A or π : S × A → [0, 1], i.e. probability to choose action a ∈ A Given a policy π, its expected reward, is V π (st) = EP ∞ k=0 γk rt+k st, π where a ∼ π(st, ·) @freakonometrics freakonometrics arXiv:2003.10014 4 / 18
  • 5. Machine Learning Data chunk D = {(x1, y1), · · · , (xn, yn)} on X × Y. E.g. classification, Y = {0, 1}. With a logistic regression f (x) = 1 + e−x β −1 ∈ [0, 1] =: A Given a loss : A × Y → R+, define regret as Rn = 1 n n i=1 (f (xi ), yi ) − inf f ∈F 1 n n i=1 (f (xi ), yi ) optimal oracle risk As proved in Robbins (1952), minimizing regret ←→ maximizing a reward @freakonometrics freakonometrics arXiv:2003.10014 5 / 18
  • 6. Online Learning Consider a dynamic setting, with sequential data (xt, yt) Define regret for some forecasting rule ft RT = 1 T T t=1 (ft(xt), yt+1) − inf f ∈F 1 T T t=1 (f (xt), yt+1) Classical model averaging (see online aggregator) k models providing forecasts tyt+1 = ty1 t+1, · · · , tyk t+1, define tyω t+1 = ω tyt+1 RT = 1 T T t=1 (y∗ t , yt) − inf ω∈Ω 1 T T t=1 (tyω t+1, yt+1) @freakonometrics freakonometrics arXiv:2003.10014 6 / 18
  • 7. (multi-armed) Bandits Pulling arm k yields (random) reward Rk, with mean Q(k) = E(Rk). Optimal policy is a = argmax a∈{1,...,K} Q(a) , with return Q = Q(a ) Consider a sequential game, with at+1 = ft(at, rt, · · · , a1, r1) The regret of a bandit algorithm is thus: RT (f ) = TQ − E T t=1 rt = TQ oracle −E T t=1 Q(at) Classical exploration-exploitation tradeoff. See Rothschild (1974) or Weitzman (1979) for economic applications. @freakonometrics freakonometrics arXiv:2003.10014 7 / 18
  • 8. Reinforcement Learning Framework (1) Consider the (infinite time horizon) discounted return Gt = ∞ k=0 γk rt+1+k = rt+1 + γGt+1 where 0 ≤ γ ≤ 1 is the discount factor To quantify the performance of an action, define the Q-value on S × A: Qπ (st, at) = EP Gt st, at, π (1) In order to maximize the reward, as in bandits, the optimal strategy is characterized by the optimal policies π (st) = argmax a∈A Q (st, a) , where Q (st, at) = max π∈Π Qπ (st, at) while V (st) = max a∈A Q (st, a) (see Sutton and Barto (1998)). @freakonometrics freakonometrics arXiv:2003.10014 8 / 18
  • 9. Reinforcement Learning Framework (2) Bellman’s equation is here V π (st) = a∈A π(a|s) r(st, a) + γ s ∈S T(st, a, s )V π (s ) , or, with n states of the world, if Tπ ij = a∈A π(a|i)T(i, a, j) and rπ i = a∈A π(a|i)r(i, a)    V π(1) ... V π(n)    =    rπ 1 ... rπ n    + γ    Tπ 11 . . . Tπ 1n ... ... ... Tπ n1 . . . Tπ nn       V π(1) ... V π(n)    i.e. V π = rπ + γTπ V π = Tπ(V π ), using Bellman’s operator Tπ (then use the contraction mapping theorem, see Denardo (1967)). @freakonometrics freakonometrics arXiv:2003.10014 9 / 18
  • 10. Inventory problem (Hellwig (1973)) Action at ∈ A = {0, 1, 2, . . . , m} denote the number of ordered items arriving on the morning of day t, puchased at individual prices p. States st = S = {0, 1, 2, . . . , m} are the number of items available at the end of the day (before ordering new items for the next day). Then, state dynamics is st+1 = min{(st + at), m} − εt + where εt is the unpredictable demand, independent and identically distributed variables (st) is a Markov chain, T(s, a, s ) = P st+1 = s st = s, at = a = P εt = min{(s + a), m} − s + The reward function R is such that, on day t, revenue made is rt = −pat + pεt = −pat + p min{(st + at), m} − st+1 + = R(st, at, st+1) where p is the price when items are sold to consumers (and p is the price when items are purchased). @freakonometrics freakonometrics arXiv:2003.10014 10 / 18
  • 11. Econ: Consumption & Income Dynamics Consider an infinitely living agent, with utility u(ct) when consuming ct ≥ 0 in period t. The agent receives random income yt at time t, and assume that (yt) is a Markov process, T(s, s ) = P[yt+1 = s |yt = s]. Let wt denote the wealth of the agent, at time t, so that wt+1 = wt + yt − ct. Assume that the wealth must be non-negative, so ct ≤ wt + yt. And for convenience, w0 = 0, as in Lettau and Uhlig (1999). At time t, given state st = (wt, yt), we seek ct solution of v(wt, yt) = max c∈[0,wt +yt ]    u(c) + γ y v(wt + yt − c, y ) T(yt, y )    This is a standard recursive consumption model, discussed in Ljungqvist and Sargent (2018) or Hansen and Sargent (2013) @freakonometrics freakonometrics arXiv:2003.10014 11 / 18
  • 12. Econ: Bounded Rationality & Experiments With adaptative learning, Marcet and Sargent (1989a,b) proved that there was convergence to a rational expectations equilibrium. Leimar and McNamara (2019) suggested that adaptive and reinforcement learning leads to bounded rationality, while Abel (2019) motivates reinforcement learning as a suitable formalism for studying bounded rational agents, since “at a high level, Reinforcement Learning unifies learning and decision making into a single, general framework ”. Thompson (1933) introduced this idea of adaptive treatment assignment. Weber (1992) proved that this problem can be expressed using multi-armed bandits, and the optimal solution to this bandit problem is to choose the arm with the to the highest Gittins index, that can be related to the so-called Thompson sampling strategy, intensively used for AB Testing and experimental economics, see Chattopadhyay and Duflo (2004). @freakonometrics freakonometrics arXiv:2003.10014 12 / 18
  • 13. Operation Research & Stochastic Games Maskin and Tirole (1988) introduced the concept of Markov perfect equilibrium, Brown (1951) suggested that firms could form beliefs about competitors’ choice probabilities, using some fictitious plays, also called Cournot learning (studied more deeply in Hopkins (2002)). Bernheim (1984) and Pearce (1984) added assumptions on firms beliefs, called rationalizability, under which we can end-up with Nash equilibria. @freakonometrics freakonometrics arXiv:2003.10014 13 / 18
  • 14. Finance Market Micro-Structure - see order book dynamics as aggregation of other traders actions ( buy or sell orders) - see Vyetrenko and Xu (2019) Portfolio Allocation - see Li and Hoi (2014) Risk Management, with realistic market frictions or imperfections. But finance is related to risk measures... Discounted return Gt is a random variable, function of st, at and π, that can be denoted Φπ(st, at). We defined Qπ (st, at) = EP Φπ (st, at) st, at, π but we can consider another functional of the distribution Φπ(st, at), see Distributional Reinforcement Learning, Bellemare et al. (2017). @freakonometrics freakonometrics arXiv:2003.10014 14 / 18
  • 15. Wrap-Up Intensive recent literature related to Reinforcement Learning (CS, AI, ML) Focus on algorithms (Q-learning – Watkins and Dayan (1992) – TD(λ), deep RL, etc) Connexions with many applications in economics and finance dynamic programming, operation research, stochastic games, risk measures, etc. Several recent extensions inverse reinforcement learning (Miller (1984), Pakes (1986)) distributional reinforcement learning see arXiv:2003.10014 for more details, and references @freakonometrics freakonometrics arXiv:2003.10014 15 / 18
  • 16. References I Abel, D. (2019). Concepts in Bounded Rationality: Perspectives from Reinforcement Learning. PhD thesis, Brown University. Bellemare, M. G., Dabney, W., and Munos, R. (2017). A distributional perspective on reinforcement learning. arXiv:1707.06887. Bernheim, B. D. (1984). Rationalizable strategic behavior. Econometrica, 52(4):1007–1028. Brown, G. W. (1951). Iterative solutions of games by fictitious play. In Koopmans, T., editor, Activity Analysis of Production and Allocation, pages 374–376. John Wiley & Sons, Inc. Chattopadhyay, R. and Duflo, E. (2004). Women as policy makers: Evidence from a randomized policy experiment in india. Econometrica, 72(5):1409–1443. Denardo, E. V. (1967). Contraction mappings in the theory underlying dynamic programming. SIAM Review, 9(2):165–177. Hansen, L. P. and Sargent, T. J. (2013). Recursive Models of Dynamic Linear Economies. The Gorman Lectures in Economics. Princeton University Press. Hellwig, M. F. (1973). Sequential models in economic dynamics. PhD thesis, Massachusetts Institute of Technology, Department of Economics. @freakonometrics freakonometrics arXiv:2003.10014 16 / 18
  • 17. References II Hopkins, E. (2002). Two competing models of how people learn in games. Econometrica, 70(6):2141–2166. Leimar, O. and McNamara, J. (2019). Learning leads to bounded rationality and the evolution of cognitive bias in public goods games. Nature Scientific Reports, 9:16319. Lettau, M. and Uhlig, H. (1999). Rules of thumb versus dynamic programming. American Economic Review, 89(1):148–174. Li, B. and Hoi, S. C. (2014). Online portfolio selection: A survey. ACM Computing Surveys (CSUR), 46(3):1–36. Ljungqvist, L. and Sargent, T. J. (2018). Recursive Macroeconomic Theory. MIT Press, 4 edition. Marcet, A. and Sargent, T. J. (1989a). Convergence of least-squares learning in environments with hidden state variables and private information. Journal of Political Economy, 97(6):1306–1322. Marcet, A. and Sargent, T. J. (1989b). Convergence of least squares learning mechanisms in self-referential linear stochastic models. Journal of Economic Theory, 48(2):337 – 368. Maskin, E. and Tirole, J. (1988). A theory of dynamic oligopoly, I: Overview and quantity competition with large fixed costs. Econometrica, 56:549–569. Miller, R. A. (1984). Job matching and occupational choice. Journal of Political Economy, 92(6):1086–1120. @freakonometrics freakonometrics arXiv:2003.10014 17 / 18
  • 18. References III Pakes, A. (1986). Patents as options: Some estimates of the value of holding european patent stocks. Econometrica, 54(4):755–784. Pearce, D. G. (1984). Rationalizable strategic behavior and the problem of perfection. Econometrica, 52(4):1029–1050. Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527–535. Rothschild, M. (1974). A two-armed bandit theory of market pricing. Journal of Economic Theory, 9(2):185 – 202. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIP Press. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4):285–294. Vyetrenko, S. and Xu, S. (2019). Risk-sensitive compact decision trees for autonomous execution in presence of simulated market response. arXiv preprint arXiv:1906.02312. Watkins, C. J. C. H. and Dayan, P. (1992). q-learning. Machine Learning, 8(3):279–292. Weber, R. (1992). On the gittins index for multiarmed bandits. The Annals of Applied Probability, 2(4):1024–1033. Weitzman, M. L. (1979). Optimal search for the best alternative. Econometrica, 47(3):641–654. @freakonometrics freakonometrics arXiv:2003.10014 18 / 18