Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in Finance

Ashwin Rao
Ashwin RaoVice President, Data Science & Optimization at Target à Target
CME 241: Reinforcement Learning for Stochastic
Control Problems in Finance
Ashwin Rao
ICME, Stanford University
Ashwin Rao (Stanford) RL for Finance 1 / 19
Overview of the Course
Theory of Markov Decision Processes (MDPs)
Dynamic Programming (DP) Algorithms (a.k.a. model-based)
Reinforcement Learning (RL) Algorithms (a.k.a. model-free)
Plenty of Python implementations of models and algorithms
Apply these algorithms to 3 Financial/Trading problems:
(Dynamic) Asset-Allocation to maximize utility of Consumption
Optimal Exercise/Stopping of Path-dependent American Options
Optimal Trade Order Execution (managing Price Impact)
By treating each of the problems as MDPs (i.e., Stochastic Control)
We will go over classical/analytical solutions to these problems
Then introduce real-world considerations, and tackle with RL (or DP)
Ashwin Rao (Stanford) RL for Finance 2 / 19
What is the flavor of this course?
An important goal of this course is to effectively blend:
Theory/Mathematics
Programming/Algorithms
Modeling of Read-World Finance/Trading Problems
Ashwin Rao (Stanford) RL for Finance 3 / 19
Pre-Requisites and Housekeeping
Theory Pre-reqs: Optimization, Probability, Pricing, Portfolio Theory
Coding Pre-reqs: Data Structures/Algorithms with numpy/scipy
Alternative: Written test to demonstrate above-listed background
Grade based on Class Participation, 1 Exam, and Programming Work
Passing grade fetches 3 credits, can be applied towards MCF degree
Wed and Fri 4:30pm - 5:50pm, 01/07/2019 - 03/15/2019
Classes in GES building, room 150
Appointments: Any time Fridays or an hour before class Wednesdays
Use appointments time to discuss theory as well as your code
Ashwin Rao (Stanford) RL for Finance 4 / 19
Resources
I recommend Sutton-Barto as the companion book for this course
I won’t follow the structure of Sutton-Barto book
But I will follow his approach/treatment
I will follow the structure of David Silver’s RL course
I encourage you to augment my lectures with David’s lecture videos
Occasionally, I will wear away or speed up/slow down from this flow
We will do a bit more Theory & a lot more coding (relative to above)
You can freely use my open-source code for your coding work
I expect you to duplicate the functionality of above code in this course
We will go over some classical papers on the Finance applications
To understand in-depth the analytical solutions in simple settings
I will augment the above content with many of my own slides
All of this will be organized on the course web site
Ashwin Rao (Stanford) RL for Finance 5 / 19
The MDP Framework (that RL is based on)
Ashwin Rao (Stanford) RL for Finance 6 / 19
Components of the MDP Framework
The Agent and the Environment interact in a time-sequenced loop
Agent responds to [State, Reward] by taking an Action
Environment responds by producing next step’s (random) State
Environment also produces a (random) scalar denoted as Reward
Goal of Agent is to maximize Expected Sum of all future Rewards
By controlling the (Policy : State → Action) function
Agent often doesn’t know the Model of the Environment
Model refers to state-transition probabilities and reward function
So, Agent has to learn the Model AND learn the Optimal Policy
This is a dynamic (time-sequenced control) system under uncertainty
Ashwin Rao (Stanford) RL for Finance 7 / 19
Many real-world problems fit this MDP framework
Self-driving vehicle (speed/steering to optimize safety/time)
Game of Chess (Boolean Reward at end of game)
Complex Logistical Operations (eg: movements in a Warehouse)
Make a humanoid robot walk/run on difficult terrains
Manage an investment portfolio
Control a power station
Optimal decisions during a football game
Strategy to win an election (high-complexity MDP)
Ashwin Rao (Stanford) RL for Finance 8 / 19
Why are these problems hard?
Model of Environment is unknown (learn as you go)
State space can be large or complex (involving many variables)
Sometimes, Action space is also large or complex
No direct feedback on “correct” Actions (only feedback is Reward)
Actions can have delayed consequences (late Rewards)
Time-sequenced complexity (eg: Actions influence future Actions)
Agent Actions need to tradeoff between “explore” and “exploit”
Ashwin Rao (Stanford) RL for Finance 9 / 19
Why is RL interesting/useful to learn about?
RL solves MDP problem when Environment Model is unknown
Or when State or Action space is too large/complex
Promise of modern A.I. is based on success of RL algorithms
Potential for automated decision-making in real-world business
In 10 years: Bots that act or behave more optimal than humans
RL already solves various low-complexity real-world problems
RL will soon be the most-desired skill in the Data Science job-market
Possibilities in Finance are endless (we cover 3 important problems)
Learning RL is a lot of fun! (interesting in theory as well as coding)
Ashwin Rao (Stanford) RL for Finance 10 / 19
Optimal Asset Allocation to Maximize Consumption Utility
You can invest in (allocate wealth to) a collection of assets
Investment horizon is a fixed length of time
Each risky asset has an unknown distribution of returns
Transaction Costs & Constraints on trading hours/quantities/shorting
Allowed to consume a fraction of your wealth at specific times
Dynamic Decision: Time-Sequenced Allocation & Consumption
To maximize horizon-aggregated Utility of Consumption
Utility function represents degree of risk-aversion
So, we effectively maximize aggregate Risk-Adjusted Consumption
Ashwin Rao (Stanford) RL for Finance 11 / 19
MDP for Optimal Asset Allocation problem
State is [Current Time, Current Holdings, Current Prices]
Action is [Allocation Quantities, Consumption Quantity]
Actions limited by various real-world trading constraints
Reward is Utility of Consumption less Transaction Costs
State-transitions governed by risky asset movements
Ashwin Rao (Stanford) RL for Finance 12 / 19
Optimal Exercise of Path-Dependent American Options
RL is an alternative to Longstaff-Schwartz algorithm for Pricing
State is [Current Time, History of Spot Prices]
Action is Boolean: Exercise (i.e., Payoff and Stop) or Continue
Reward always 0, except upon Exercise (= Payoff)
State-transitions governed by Spot Price Stochastic Process
Optimal Policy ⇒ Optimal Stopping ⇒ Option Price
Can be generalized to other Optimal Stopping problems
Ashwin Rao (Stanford) RL for Finance 13 / 19
Optimal Trade Order Execution (controlling Price Impact)
You are tasked with selling a large qty of a (relatively less-liquid) stock
You have a fixed horizon over which to complete the sale
Goal is to maximize aggregate sales proceeds over horizon
If you sell too fast, Price Impact will result in poor sales proceeds
If you sell too slow, you risk running out of time
We need to model temporary and permanent Price Impacts
Objective should incorporate penalty for variance of sales proceeds
Which is equivalent to maximizing aggregate Utility of sales proceeds
Ashwin Rao (Stanford) RL for Finance 14 / 19
MDP for Optimal Trade Order Execution
State is [Time Remaining, Stock Remaining to be Sold, Market Info]
Action is Quantity of Stock to Sell at current time
Reward is Utility of Sales Proceeds (i.e., proceeds-variance-adjusted)
Reward & State-transitions governed by Price Impact Model
Real-world Model can be quite complex (Order Book Dynamics)
Ashwin Rao (Stanford) RL for Finance 15 / 19
Landmark Papers we will cover in detail
Merton’s solution for Optimal Portfolio Allocation/Consumption
Longstaff-Schwartz Algorithm for Pricing American Options
Bertsimas-Lo paper on Optimal Execution Cost
Almgren-Chriss paper on Optimal Risk-Adjusted Execution Cost
Sutton et al’s proof of Policy Gradient Theorem
Ashwin Rao (Stanford) RL for Finance 16 / 19
Week by Week (Tentative) Schedule
Week 1: Introduction to RL and Overview of Finance Problems
Week 2: Theory of Markov Decision Process & Bellman Equations
Week 3: Dynamic Programming (Policy Iteration, Value Iteration)
Week 4: Model-free Prediction (RL for Value Function estimation)
Week 5: Model-free Control (RL for Optimal Value Function/Policy)
Week 6: RL with Function Approximation (including Deep RL)
Week 7: Policy Gradient Algorithms
Week 8: Optimal Asset Allocation problem
Week 9: Optimal Exercise of American Options problem
Week 10: Optimal Trade Order Execution problem
Ashwin Rao (Stanford) RL for Finance 17 / 19
Sneak Peek into a couple of lectures in this course
Policy Gradient Theorem and Compatible Approximation Theorem
HJB Equation and Merton’s Portfolio Problem
Ashwin Rao (Stanford) RL for Finance 18 / 19
Similar Courses offered at Stanford
CS 234 (Emma Brunskill - Winter 2019)
AA 228/CS 238 (Mykel Kochenderfer - Autumn 2018)
CS 332 (Emma Brunskill - Autumn 2018)
MS&E 351 (Ben Van Roy - Winter 2019)
MS&E 348 (Gerd Infanger) - Winter 2020)
MS&E 338 (Ben Van Roy - Spring 2019)
Ashwin Rao (Stanford) RL for Finance 19 / 19
1 sur 19

Recommandé

Principles of Mathematical Economics applied to a Physical-Stores Retail Busi... par
Principles of Mathematical Economics applied to a Physical-Stores Retail Busi...Principles of Mathematical Economics applied to a Physical-Stores Retail Busi...
Principles of Mathematical Economics applied to a Physical-Stores Retail Busi...Ashwin Rao
890 vues24 diapositives
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re... par
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...
A.I. for Dynamic Decisioning under Uncertainty (for real-world problems in Re...Ashwin Rao
1.2K vues35 diapositives
Towards Improved Pricing and Hedging of Agency Mortgage-backed Securities par
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesTowards Improved Pricing and Hedging of Agency Mortgage-backed Securities
Towards Improved Pricing and Hedging of Agency Mortgage-backed SecuritiesAshwin Rao
1.2K vues54 diapositives
Chapter 3_OM par
Chapter 3_OMChapter 3_OM
Chapter 3_OMguest537689
6.5K vues103 diapositives
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C... par
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C..."Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...
"Deep Q-Learning for Trading" by Dr. Tucker Balch, Professor of Interactive C...Quantopian
905 vues40 diapositives
Using Financial Forecasts to Advise Business - Method of Forecasting - Revised par
Using Financial Forecasts to Advise Business - Method of Forecasting - RevisedUsing Financial Forecasts to Advise Business - Method of Forecasting - Revised
Using Financial Forecasts to Advise Business - Method of Forecasting - RevisedIrma Miller
2.1K vues43 diapositives

Contenu connexe

Tendances

"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and... par
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and..."Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...Quantopian
758 vues13 diapositives
Forecasting Techniques par
Forecasting TechniquesForecasting Techniques
Forecasting Techniquesguest865c0e0c
178.4K vues42 diapositives
Stock Return Forecast - Theory and Empirical Evidence par
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical EvidenceTai Tran
3.5K vues53 diapositives
Forecasting par
ForecastingForecasting
ForecastingSUDARSHAN KUMAR PATEL
3.5K vues22 diapositives
Demand Forcasting par
Demand ForcastingDemand Forcasting
Demand Forcastingitsvineeth209
4.1K vues31 diapositives
Class notes forecasting par
Class notes forecastingClass notes forecasting
Class notes forecastingArun Kumar
14.7K vues30 diapositives

Tendances(20)

"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and... par Quantopian
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and..."Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
"Trading Without Regret" by Dr. Michael Kearns, Professor at the Computer and...
Quantopian758 vues
Forecasting Techniques par guest865c0e0c
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
guest865c0e0c178.4K vues
Stock Return Forecast - Theory and Empirical Evidence par Tai Tran
Stock Return Forecast - Theory and Empirical EvidenceStock Return Forecast - Theory and Empirical Evidence
Stock Return Forecast - Theory and Empirical Evidence
Tai Tran3.5K vues
Class notes forecasting par Arun Kumar
Class notes forecastingClass notes forecasting
Class notes forecasting
Arun Kumar14.7K vues
Arbitrage pricing theory & Efficient market hypothesis par Hari Ram
Arbitrage pricing theory & Efficient market hypothesisArbitrage pricing theory & Efficient market hypothesis
Arbitrage pricing theory & Efficient market hypothesis
Hari Ram546 vues
Modeling Transaction Costs for Algorithmic Strategies par Quantopian
Modeling Transaction Costs for Algorithmic StrategiesModeling Transaction Costs for Algorithmic Strategies
Modeling Transaction Costs for Algorithmic Strategies
Quantopian5.8K vues
An Empirical Investigation Of The Arbitrage Pricing Theory par Akhil Goyal
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing Theory
Akhil Goyal3.8K vues
Automatic algorithms for time series forecasting par Rob Hyndman
Automatic algorithms for time series forecastingAutomatic algorithms for time series forecasting
Automatic algorithms for time series forecasting
Rob Hyndman4.6K vues
Forecasting and methods of forecasting par Milind Pelagade
Forecasting and methods of forecastingForecasting and methods of forecasting
Forecasting and methods of forecasting
Milind Pelagade18.9K vues
Forecasting par SVGANGAD
ForecastingForecasting
Forecasting
SVGANGAD1.3K vues
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin... par Quantopian
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin..."Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
"Enhancing Statistical Significance of Backtests" by Dr. Ernest Chan, Managin...
Quantopian892 vues

Similaire à Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in Finance

Quantitative management par
Quantitative managementQuantitative management
Quantitative managementsmumbahelp
413 vues7 diapositives
Operation research history and overview application limitation par
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitationBalaji P
15.9K vues39 diapositives
decision making in Lp par
decision making in Lp decision making in Lp
decision making in Lp Dronak Sahu
865 vues30 diapositives
Supply Chain Management - Optimization technology par
Supply Chain Management - Optimization technologySupply Chain Management - Optimization technology
Supply Chain Management - Optimization technologyNurhazman Abdul Aziz
3.4K vues20 diapositives
Fdp session rtu session 1 par
Fdp session rtu session 1Fdp session rtu session 1
Fdp session rtu session 1sprsingh1
62 vues35 diapositives
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2... par
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...QuantInsti
2.7K vues28 diapositives

Similaire à Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in Finance(20)

Quantitative management par smumbahelp
Quantitative managementQuantitative management
Quantitative management
smumbahelp413 vues
Operation research history and overview application limitation par Balaji P
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitation
Balaji P15.9K vues
decision making in Lp par Dronak Sahu
decision making in Lp decision making in Lp
decision making in Lp
Dronak Sahu865 vues
Fdp session rtu session 1 par sprsingh1
Fdp session rtu session 1Fdp session rtu session 1
Fdp session rtu session 1
sprsingh162 vues
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2... par QuantInsti
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
Can we use Mixture Models to Predict Market Bottoms? by Brian Christopher - 2...
QuantInsti2.7K vues
The Environmental Effect Of Operational Planning And... par Kim Moore
The Environmental Effect Of Operational Planning And...The Environmental Effect Of Operational Planning And...
The Environmental Effect Of Operational Planning And...
Kim Moore3 vues
CA02CA3103 RMTLPP Formulation.pdf par MinawBelay
CA02CA3103 RMTLPP Formulation.pdfCA02CA3103 RMTLPP Formulation.pdf
CA02CA3103 RMTLPP Formulation.pdf
MinawBelay51 vues
080324 Miller Opportunity Matrix Asq Presentation par rwmill9716
080324 Miller Opportunity Matrix Asq Presentation080324 Miller Opportunity Matrix Asq Presentation
080324 Miller Opportunity Matrix Asq Presentation
rwmill9716572 vues
Operations research ppt par raaz kumar
Operations research pptOperations research ppt
Operations research ppt
raaz kumar34.1K vues
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx par felicidaddinwoodie
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
Sfeldman performance bb_worldemea07 par Steve Feldman
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
Steve Feldman1K vues
Comparing adaptive management and real options approaches: slides and pre-print par iadine Chades
Comparing adaptive management and real options approaches: slides and pre-printComparing adaptive management and real options approaches: slides and pre-print
Comparing adaptive management and real options approaches: slides and pre-print
iadine Chades5.6K vues
B2 2006 sizing_benchmarking (1) par Steve Feldman
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
Steve Feldman281 vues

Plus de Ashwin Rao

Stochastic Control/Reinforcement Learning for Optimal Market Making par
Stochastic Control/Reinforcement Learning for Optimal Market MakingStochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market MakingAshwin Rao
1.5K vues30 diapositives
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search par
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAdaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAshwin Rao
220 vues7 diapositives
Fundamental Theorems of Asset Pricing par
Fundamental Theorems of Asset PricingFundamental Theorems of Asset Pricing
Fundamental Theorems of Asset PricingAshwin Rao
2K vues38 diapositives
Evolutionary Strategies as an alternative to Reinforcement Learning par
Evolutionary Strategies as an alternative to Reinforcement LearningEvolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement LearningAshwin Rao
327 vues7 diapositives
Understanding Dynamic Programming through Bellman Operators par
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman OperatorsAshwin Rao
636 vues11 diapositives
Stochastic Control of Optimal Trade Order Execution par
Stochastic Control of Optimal Trade Order ExecutionStochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order ExecutionAshwin Rao
759 vues16 diapositives

Plus de Ashwin Rao(20)

Stochastic Control/Reinforcement Learning for Optimal Market Making par Ashwin Rao
Stochastic Control/Reinforcement Learning for Optimal Market MakingStochastic Control/Reinforcement Learning for Optimal Market Making
Stochastic Control/Reinforcement Learning for Optimal Market Making
Ashwin Rao1.5K vues
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search par Ashwin Rao
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree SearchAdaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Adaptive Multistage Sampling Algorithm: The Origins of Monte Carlo Tree Search
Ashwin Rao220 vues
Fundamental Theorems of Asset Pricing par Ashwin Rao
Fundamental Theorems of Asset PricingFundamental Theorems of Asset Pricing
Fundamental Theorems of Asset Pricing
Ashwin Rao2K vues
Evolutionary Strategies as an alternative to Reinforcement Learning par Ashwin Rao
Evolutionary Strategies as an alternative to Reinforcement LearningEvolutionary Strategies as an alternative to Reinforcement Learning
Evolutionary Strategies as an alternative to Reinforcement Learning
Ashwin Rao327 vues
Understanding Dynamic Programming through Bellman Operators par Ashwin Rao
Understanding Dynamic Programming through Bellman OperatorsUnderstanding Dynamic Programming through Bellman Operators
Understanding Dynamic Programming through Bellman Operators
Ashwin Rao636 vues
Stochastic Control of Optimal Trade Order Execution par Ashwin Rao
Stochastic Control of Optimal Trade Order ExecutionStochastic Control of Optimal Trade Order Execution
Stochastic Control of Optimal Trade Order Execution
Ashwin Rao759 vues
Overview of Stochastic Calculus Foundations par Ashwin Rao
Overview of Stochastic Calculus FoundationsOverview of Stochastic Calculus Foundations
Overview of Stochastic Calculus Foundations
Ashwin Rao375 vues
Risk-Aversion, Risk-Premium and Utility Theory par Ashwin Rao
Risk-Aversion, Risk-Premium and Utility TheoryRisk-Aversion, Risk-Premium and Utility Theory
Risk-Aversion, Risk-Premium and Utility Theory
Ashwin Rao3.4K vues
Value Function Geometry and Gradient TD par Ashwin Rao
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
Ashwin Rao579 vues
HJB Equation and Merton's Portfolio Problem par Ashwin Rao
HJB Equation and Merton's Portfolio ProblemHJB Equation and Merton's Portfolio Problem
HJB Equation and Merton's Portfolio Problem
Ashwin Rao1K vues
Policy Gradient Theorem par Ashwin Rao
Policy Gradient TheoremPolicy Gradient Theorem
Policy Gradient Theorem
Ashwin Rao2.9K vues
A Quick and Terse Introduction to Efficient Frontier Mathematics par Ashwin Rao
A Quick and Terse Introduction to Efficient Frontier MathematicsA Quick and Terse Introduction to Efficient Frontier Mathematics
A Quick and Terse Introduction to Efficient Frontier Mathematics
Ashwin Rao831 vues
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network par Ashwin Rao
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural NetworkRecursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Recursive Formulation of Gradient in a Dense Feed-Forward Deep Neural Network
Ashwin Rao560 vues
Demystifying the Bias-Variance Tradeoff par Ashwin Rao
Demystifying the Bias-Variance TradeoffDemystifying the Bias-Variance Tradeoff
Demystifying the Bias-Variance Tradeoff
Ashwin Rao2.4K vues
Category Theory made easy with (ugly) pictures par Ashwin Rao
Category Theory made easy with (ugly) picturesCategory Theory made easy with (ugly) pictures
Category Theory made easy with (ugly) pictures
Ashwin Rao1.6K vues
Risk Pooling sensitivity to Correlation par Ashwin Rao
Risk Pooling sensitivity to CorrelationRisk Pooling sensitivity to Correlation
Risk Pooling sensitivity to Correlation
Ashwin Rao349 vues
Abstract Algebra in 3 Hours par Ashwin Rao
Abstract Algebra in 3 HoursAbstract Algebra in 3 Hours
Abstract Algebra in 3 Hours
Ashwin Rao1.6K vues
OmniChannelNewsvendor par Ashwin Rao
OmniChannelNewsvendorOmniChannelNewsvendor
OmniChannelNewsvendor
Ashwin Rao490 vues
The Newsvendor meets the Options Trader par Ashwin Rao
The Newsvendor meets the Options TraderThe Newsvendor meets the Options Trader
The Newsvendor meets the Options Trader
Ashwin Rao1.1K vues
The Fuss about || Haskell | Scala | F# || par Ashwin Rao
The Fuss about || Haskell | Scala | F# ||The Fuss about || Haskell | Scala | F# ||
The Fuss about || Haskell | Scala | F# ||
Ashwin Rao1.5K vues

Dernier

Design of machine elements-UNIT 3.pptx par
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptxgopinathcreddy
37 vues31 diapositives
_MAKRIADI-FOTEINI_diploma thesis.pptx par
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptxfotinimakriadi
12 vues32 diapositives
Global airborne satcom market report par
Global airborne satcom market reportGlobal airborne satcom market report
Global airborne satcom market reportdefencereport78
6 vues13 diapositives
Web Dev Session 1.pptx par
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptxVedVekhande
17 vues22 diapositives
Pitchbook Repowerlab.pdf par
Pitchbook Repowerlab.pdfPitchbook Repowerlab.pdf
Pitchbook Repowerlab.pdfVictoriaGaleano
6 vues12 diapositives
START Newsletter 3 par
START Newsletter 3START Newsletter 3
START Newsletter 3Start Project
7 vues25 diapositives

Dernier(20)

Design of machine elements-UNIT 3.pptx par gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy37 vues
_MAKRIADI-FOTEINI_diploma thesis.pptx par fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi12 vues
REACTJS.pdf par ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR337 vues
Créativité dans le design mécanique à l’aide de l’optimisation topologique par LIEGE CREATIVE
Créativité dans le design mécanique à l’aide de l’optimisation topologiqueCréativité dans le design mécanique à l’aide de l’optimisation topologique
Créativité dans le design mécanique à l’aide de l’optimisation topologique
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx par lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78180 vues
MongoDB.pdf par ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR349 vues
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf par AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure8 vues
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth par Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 15 vues
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... par csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn8 vues
Ansari: Practical experiences with an LLM-based Islamic Assistant par M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant

Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in Finance

  • 1. CME 241: Reinforcement Learning for Stochastic Control Problems in Finance Ashwin Rao ICME, Stanford University Ashwin Rao (Stanford) RL for Finance 1 / 19
  • 2. Overview of the Course Theory of Markov Decision Processes (MDPs) Dynamic Programming (DP) Algorithms (a.k.a. model-based) Reinforcement Learning (RL) Algorithms (a.k.a. model-free) Plenty of Python implementations of models and algorithms Apply these algorithms to 3 Financial/Trading problems: (Dynamic) Asset-Allocation to maximize utility of Consumption Optimal Exercise/Stopping of Path-dependent American Options Optimal Trade Order Execution (managing Price Impact) By treating each of the problems as MDPs (i.e., Stochastic Control) We will go over classical/analytical solutions to these problems Then introduce real-world considerations, and tackle with RL (or DP) Ashwin Rao (Stanford) RL for Finance 2 / 19
  • 3. What is the flavor of this course? An important goal of this course is to effectively blend: Theory/Mathematics Programming/Algorithms Modeling of Read-World Finance/Trading Problems Ashwin Rao (Stanford) RL for Finance 3 / 19
  • 4. Pre-Requisites and Housekeeping Theory Pre-reqs: Optimization, Probability, Pricing, Portfolio Theory Coding Pre-reqs: Data Structures/Algorithms with numpy/scipy Alternative: Written test to demonstrate above-listed background Grade based on Class Participation, 1 Exam, and Programming Work Passing grade fetches 3 credits, can be applied towards MCF degree Wed and Fri 4:30pm - 5:50pm, 01/07/2019 - 03/15/2019 Classes in GES building, room 150 Appointments: Any time Fridays or an hour before class Wednesdays Use appointments time to discuss theory as well as your code Ashwin Rao (Stanford) RL for Finance 4 / 19
  • 5. Resources I recommend Sutton-Barto as the companion book for this course I won’t follow the structure of Sutton-Barto book But I will follow his approach/treatment I will follow the structure of David Silver’s RL course I encourage you to augment my lectures with David’s lecture videos Occasionally, I will wear away or speed up/slow down from this flow We will do a bit more Theory & a lot more coding (relative to above) You can freely use my open-source code for your coding work I expect you to duplicate the functionality of above code in this course We will go over some classical papers on the Finance applications To understand in-depth the analytical solutions in simple settings I will augment the above content with many of my own slides All of this will be organized on the course web site Ashwin Rao (Stanford) RL for Finance 5 / 19
  • 6. The MDP Framework (that RL is based on) Ashwin Rao (Stanford) RL for Finance 6 / 19
  • 7. Components of the MDP Framework The Agent and the Environment interact in a time-sequenced loop Agent responds to [State, Reward] by taking an Action Environment responds by producing next step’s (random) State Environment also produces a (random) scalar denoted as Reward Goal of Agent is to maximize Expected Sum of all future Rewards By controlling the (Policy : State → Action) function Agent often doesn’t know the Model of the Environment Model refers to state-transition probabilities and reward function So, Agent has to learn the Model AND learn the Optimal Policy This is a dynamic (time-sequenced control) system under uncertainty Ashwin Rao (Stanford) RL for Finance 7 / 19
  • 8. Many real-world problems fit this MDP framework Self-driving vehicle (speed/steering to optimize safety/time) Game of Chess (Boolean Reward at end of game) Complex Logistical Operations (eg: movements in a Warehouse) Make a humanoid robot walk/run on difficult terrains Manage an investment portfolio Control a power station Optimal decisions during a football game Strategy to win an election (high-complexity MDP) Ashwin Rao (Stanford) RL for Finance 8 / 19
  • 9. Why are these problems hard? Model of Environment is unknown (learn as you go) State space can be large or complex (involving many variables) Sometimes, Action space is also large or complex No direct feedback on “correct” Actions (only feedback is Reward) Actions can have delayed consequences (late Rewards) Time-sequenced complexity (eg: Actions influence future Actions) Agent Actions need to tradeoff between “explore” and “exploit” Ashwin Rao (Stanford) RL for Finance 9 / 19
  • 10. Why is RL interesting/useful to learn about? RL solves MDP problem when Environment Model is unknown Or when State or Action space is too large/complex Promise of modern A.I. is based on success of RL algorithms Potential for automated decision-making in real-world business In 10 years: Bots that act or behave more optimal than humans RL already solves various low-complexity real-world problems RL will soon be the most-desired skill in the Data Science job-market Possibilities in Finance are endless (we cover 3 important problems) Learning RL is a lot of fun! (interesting in theory as well as coding) Ashwin Rao (Stanford) RL for Finance 10 / 19
  • 11. Optimal Asset Allocation to Maximize Consumption Utility You can invest in (allocate wealth to) a collection of assets Investment horizon is a fixed length of time Each risky asset has an unknown distribution of returns Transaction Costs & Constraints on trading hours/quantities/shorting Allowed to consume a fraction of your wealth at specific times Dynamic Decision: Time-Sequenced Allocation & Consumption To maximize horizon-aggregated Utility of Consumption Utility function represents degree of risk-aversion So, we effectively maximize aggregate Risk-Adjusted Consumption Ashwin Rao (Stanford) RL for Finance 11 / 19
  • 12. MDP for Optimal Asset Allocation problem State is [Current Time, Current Holdings, Current Prices] Action is [Allocation Quantities, Consumption Quantity] Actions limited by various real-world trading constraints Reward is Utility of Consumption less Transaction Costs State-transitions governed by risky asset movements Ashwin Rao (Stanford) RL for Finance 12 / 19
  • 13. Optimal Exercise of Path-Dependent American Options RL is an alternative to Longstaff-Schwartz algorithm for Pricing State is [Current Time, History of Spot Prices] Action is Boolean: Exercise (i.e., Payoff and Stop) or Continue Reward always 0, except upon Exercise (= Payoff) State-transitions governed by Spot Price Stochastic Process Optimal Policy ⇒ Optimal Stopping ⇒ Option Price Can be generalized to other Optimal Stopping problems Ashwin Rao (Stanford) RL for Finance 13 / 19
  • 14. Optimal Trade Order Execution (controlling Price Impact) You are tasked with selling a large qty of a (relatively less-liquid) stock You have a fixed horizon over which to complete the sale Goal is to maximize aggregate sales proceeds over horizon If you sell too fast, Price Impact will result in poor sales proceeds If you sell too slow, you risk running out of time We need to model temporary and permanent Price Impacts Objective should incorporate penalty for variance of sales proceeds Which is equivalent to maximizing aggregate Utility of sales proceeds Ashwin Rao (Stanford) RL for Finance 14 / 19
  • 15. MDP for Optimal Trade Order Execution State is [Time Remaining, Stock Remaining to be Sold, Market Info] Action is Quantity of Stock to Sell at current time Reward is Utility of Sales Proceeds (i.e., proceeds-variance-adjusted) Reward & State-transitions governed by Price Impact Model Real-world Model can be quite complex (Order Book Dynamics) Ashwin Rao (Stanford) RL for Finance 15 / 19
  • 16. Landmark Papers we will cover in detail Merton’s solution for Optimal Portfolio Allocation/Consumption Longstaff-Schwartz Algorithm for Pricing American Options Bertsimas-Lo paper on Optimal Execution Cost Almgren-Chriss paper on Optimal Risk-Adjusted Execution Cost Sutton et al’s proof of Policy Gradient Theorem Ashwin Rao (Stanford) RL for Finance 16 / 19
  • 17. Week by Week (Tentative) Schedule Week 1: Introduction to RL and Overview of Finance Problems Week 2: Theory of Markov Decision Process & Bellman Equations Week 3: Dynamic Programming (Policy Iteration, Value Iteration) Week 4: Model-free Prediction (RL for Value Function estimation) Week 5: Model-free Control (RL for Optimal Value Function/Policy) Week 6: RL with Function Approximation (including Deep RL) Week 7: Policy Gradient Algorithms Week 8: Optimal Asset Allocation problem Week 9: Optimal Exercise of American Options problem Week 10: Optimal Trade Order Execution problem Ashwin Rao (Stanford) RL for Finance 17 / 19
  • 18. Sneak Peek into a couple of lectures in this course Policy Gradient Theorem and Compatible Approximation Theorem HJB Equation and Merton’s Portfolio Problem Ashwin Rao (Stanford) RL for Finance 18 / 19
  • 19. Similar Courses offered at Stanford CS 234 (Emma Brunskill - Winter 2019) AA 228/CS 238 (Mykel Kochenderfer - Autumn 2018) CS 332 (Emma Brunskill - Autumn 2018) MS&E 351 (Ben Van Roy - Winter 2019) MS&E 348 (Gerd Infanger) - Winter 2020) MS&E 338 (Ben Van Roy - Spring 2019) Ashwin Rao (Stanford) RL for Finance 19 / 19