SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Reinforcement learning
for trading with Python
December 21, 2018 はんなりPython
Taku Yoshioka
• A previous work on github: q-trader
• Q-learning
• Trading model
• State, action and reward
• Implementation
• Results
https://quantdare.com/deep-reinforcement-trading/
edwardhdlu/q-trader
• An implementation of Q-learning applied to (short-
term) stock trading
Action value function (Q-value)
• Expected discounted cumlative future reward given
the current state and action then following the
optimal policy (mapping from state to action)
Q(st, at) = E
"
X
k=0
k
rt+k | s = st, a = at
#
: discount factor
t: time step
st: state
at: action
rt: (immediate) reward
Q-learning
• An algorithm of Reinforcement learning for learning
the optimal Q-value
Qnew
(st, at) = (1 ↵)Qold
(st, at) + ↵(rt + max
a
Q(st+1, a))
↵ 2 (0, 1): learning rate
• For applying Q-learning, collect samples in
episodes represented as tuples (st, at, rt, st+1)
Deep Q-learning
• Representing action value function with a deep
network and minimizing loss function
L =
X
t2D
(Q(st, at) yt)
2
yt = rt + max
a
Q(st+1, a)
Note
• In contrast to supervised learning, the target value involves the current
network outputs. Thus network parameters should be gradually updated
• The order of samples in minibatches can be randomized to decorrelate
sequential samples (replay buffer)
D: minibatch
Trading model
• Based on closing prices
• Three actions: buy (1 unit), sell (1 unit), sit
• Ignoring transaction fee
• Immediate transaction
• Limited number of units we can keep
State
• State: n-step time window of the 1-step differences
of past stock prices
• Sigmoid function for input scaling issue
st = (dt ⌧+1, · · · , dt 1, dt)
dt = sigmoid (pt pt 1)
pt: price at time t (discrete)
⌧: time window size (steps)
Reward
• Depending on action
• Sit: 0
• Buy: a negative constant (configurable)
• Sell: profit (sell price - bought price)
This needs to be appropriately designed
Implementation
• Training (train.py)
Initialization, sample collection, training loop
• Environment (environment.py)
Loading stock data, making state transition with reward
• Agent (agent.py)
Feature extraction, Q-learning, e-greedy exploration, saving model
• Results examination (learning_curve.py,
plot_learning_curve.py, evaluate.py)
Compute learning curve for given data, plotting
Configuration
• For comparison of various experimental settings
stock_name : ^GSPC
window_size : 10
episode_count: 500
result_dir : ./models/config
batch_size: 32
clip_reward : True
reward_for_buy: -10
gamma : 0.995
learning_rate : 0.001
optimizer : Adam
inventory_max : 10
from ruamel.yaml import YAML
with open(sys.argv[1]) as f:
yaml = YAML()
config = yaml.load(f)
stock_name = config["stock_name"]
window_size = config["window_size"]
episode_count = config["episode_count"]
result_dir = config["result_dir"]
batch_size = config["batch_size"]
Parsing config file
• Get parameters as a dict
stock_name, window_size, episode_count = sys.argv[1], int(sys.argv[2]), int(
sys.argv[3])
batch_size = 32
agent = Agent(window_size)
# Environment
env = SimpleTradeEnv(stock_name, window_size, agent)
train.py
• Instantiate RL agent and environment for trading
# Loop over episodes
for e in range(episode_count + 1):
# Initialization before starting an episode
state = env.reset()
agent.inventory = []
done = False
# Loop in an episode
while not done:
action = agent.act(state)
next_state, reward, done, _ = env.step(action)
agent.memory.append((state, action, reward, next_state, done))
state = next_state
if len(agent.memory) > batch_size:
agent.expReplay(batch_size)
if e % 10 == 0:
agent.model.save("models/model_ep" + str(e))
• Loop over episodes
• Collect samples and train the model (expReplay)
• Train model every 10 episodes
environment.py
• Loading stock data
class SimpleTradeEnv(object):
def __init__(self, stock_name, window_size, agent, inventory_max,
clip_reward=True, reward_for_buy=-20, print_trade=True):
self.data = getStockDataVec(stock_name)
self.window_size = window_size
self.agent = agent
self.print_trade = print_trade
self.reward_for_buy = reward_for_buy
self.clip_reward = clip_reward
self.inventory_max = inventory_max
• Computing reward for action and making state
transition
def step(self, action):
# 0: Sit
# 1: Buy
# 2: Sell
assert(action in (0, 1, 2))
# Reward
if action == 0:
reward = 0
elif action == 1:
# Following slide
else:
if len(self.agent.inventory) > 0:
# Following slide
# State transition
next_state = getState(self.data, self.t + 1, self.window_size + 1,
self.agent)
done = True if self.t == len(self.data) - 2 else False
self.t += 1
return next_state, reward, done, {}
• Reward for buy
elif action == 1:
if len(self.agent.inventory) < self.inventory_max:
reward = self.reward_for_buy
self.agent.inventory.append(self.data[self.t])
if self.print_trade:
print("Buy: " + formatPrice(self.data[self.t]))
else:
reward = 0
if self.print_trade:
print("Buy: not possible")
• Reward for sell
else:
if len(self.agent.inventory) > 0:
bought_price = self.agent.inventory.pop(0)
profit = self.data[self.t] - bought_price
reward = max(profit, 0) if self.clip_reward else profit
self.total_profit += profit
if self.print_trade:
print("Sell: " + formatPrice(self.data[self.t]) +
" | Profit: " + formatPrice(reward))
# returns an an n-day state representation ending at time t
def getState(data, t, n, agent):
d = t - n + 1
block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad # with t0
res = []
for i in range(n - 1):
res.append(sigmoid(block[i + 1] - block[i]))
return agent.modify_state(np.array([res]))
• Computing 1-step differences of time series of
stock prices
• Adding information if the agent has bought
def modify_state(self, state):
if len(self.inventory) > 0:
state = np.hstack((state, [[1]]))
else:
state = np.hstack((state, [[0]]))
return state
(in agent.py)
agent.py
• Implements RL agent
• Load pre-trained model when evaluation (is_eval)
class Agent:
def __init__(self, state_size, is_eval=False, model_name="", result_dir="", gamma=0.95,
learning_rate=0.001, optimizer="Adam"):
self.state_size = state_size # normalized previous days
self.action_size = 3 # sit, buy, sell
self.memory = deque(maxlen=1000)
self.inventory = []
self.model_name = model_name
self.is_eval = is_eval
self.gamma = gamma
self.epsilon = 1.0
self.epsilon_min = 0.01
self.epsilon_decay = 0.995
self.learning_rate = learning_rate
self.optimizer = optimizer
self.model = load_model(result_dir + "/" + model_name) if is_eval else self._model()
def _model(self):
model = Sequential()
model.add(Dense(units=64, input_dim=self.state_size, activation="relu"))
model.add(Dense(units=32, activation="relu"))
model.add(Dense(units=8, activation="relu"))
model.add(Dense(self.action_size, activation="linear"))
model.compile(loss="mse", optimizer=Adam(lr=0.001))
return model
def act(self, state):
if not self.is_eval and np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
options = self.model.predict(state)
return np.argmax(options[0])
• Implements action value function by a neural network
• Accept state vector as input, output values for each action
• Specify square loss and ADAM optimizer
• Take argmax action or epsilon-greedy for exploration
• Compute a target value with the current network output
• Update parameters once (epochs=1)
def expReplay(self, batch_size):
subsamples = random.sample(list(self.memory), len(self.memory))
states, targets = [], []
for state, action, reward, next_state, done in subsamples:
target = reward
if not done:
target = reward + self.gamma * np.amax(self.model.predict(next_state)[0])
target_f = self.model.predict(state)
target_f[0][action] = target
states.append(state)
targets.append(target_f)
self.model.fit(np.vstack(states), np.vstack(targets), epochs=1, verbose=0,
batch_size=batch_size)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
Running scripts
# Training with Q-learning
$ python train.py config/config2.yaml
• Training
# Learning curve on training data
$ python learning_curve.py config/config2.yaml ^GSPC
• Learning curve
Plotting learning curve
# Plotting learning curve
$ python plot_learning_curve.py config/config2.yaml ^GSPC
• Specifying the data on which profits will be
computed with the trained model
Training data (^GSPC) Test data (^GSPC_2011)
• Not working well: increasing profit on test data, but
decreasing on training data
• We usually see overfitting to training data
Plotting trading behavior
• Specifying the data and model (filename)
# Plotting trading on a model
$ python evaluate.py config/config2.yaml ^GSPC_2011 model_ep500
20181221 q-trader
https://github.com/taku-y/q-trader

Contenu connexe

Similaire à 20181221 q-trader

Pruebas unitarias con django
Pruebas unitarias con djangoPruebas unitarias con django
Pruebas unitarias con djangoTomás Henríquez
 
PyCon KR 2018 Effective Tips for Django ORM in Practice
PyCon KR 2018 Effective Tips for Django ORM in PracticePyCon KR 2018 Effective Tips for Django ORM in Practice
PyCon KR 2018 Effective Tips for Django ORM in PracticeSeomgi Han
 
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기NHN FORWARD
 
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Thomas Fan
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
I am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docxI am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docxRyanEAcTuckern
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
 
From Tensorflow Graph to Tensorflow Eager
From Tensorflow Graph to Tensorflow EagerFrom Tensorflow Graph to Tensorflow Eager
From Tensorflow Graph to Tensorflow EagerGuy Hadash
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTaegyun Jeon
 
Building l10n Payroll Structures from the Ground up
Building l10n Payroll Structures from the Ground upBuilding l10n Payroll Structures from the Ground up
Building l10n Payroll Structures from the Ground upOdoo
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using pythonLino Coria
 

Similaire à 20181221 q-trader (20)

Pruebas unitarias con django
Pruebas unitarias con djangoPruebas unitarias con django
Pruebas unitarias con django
 
PyCon KR 2018 Effective Tips for Django ORM in Practice
PyCon KR 2018 Effective Tips for Django ORM in PracticePyCon KR 2018 Effective Tips for Django ORM in Practice
PyCon KR 2018 Effective Tips for Django ORM in Practice
 
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기
[2019] 하이퍼파라미터 튜닝으로 모델 성능 개선하기
 
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
Pydata DC 2018 (Skorch - A Union of Scikit-learn and PyTorch)
 
Xgboost
XgboostXgboost
Xgboost
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
I am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docxI am working on this code for my project- but the accuracy is 0-951601.docx
I am working on this code for my project- but the accuracy is 0-951601.docx
 
Gans
GansGans
Gans
 
GANs
GANsGANs
GANs
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
Xgboost
XgboostXgboost
Xgboost
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
From Tensorflow Graph to Tensorflow Eager
From Tensorflow Graph to Tensorflow EagerFrom Tensorflow Graph to Tensorflow Eager
From Tensorflow Graph to Tensorflow Eager
 
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager ExecutionTensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
TensorFlow Dev Summit 2018 Extended: TensorFlow Eager Execution
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
Building l10n Payroll Structures from the Ground up
Building l10n Payroll Structures from the Ground upBuilding l10n Payroll Structures from the Ground up
Building l10n Payroll Structures from the Ground up
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Introduction to deep learning using python
Introduction to deep learning using pythonIntroduction to deep learning using python
Introduction to deep learning using python
 

Plus de Taku Yoshioka

20191123 bayes dl-jp
20191123 bayes dl-jp20191123 bayes dl-jp
20191123 bayes dl-jpTaku Yoshioka
 
20171207 domain-adaptation
20171207 domain-adaptation20171207 domain-adaptation
20171207 domain-adaptationTaku Yoshioka
 
20171025 pp-in-robotics
20171025 pp-in-robotics20171025 pp-in-robotics
20171025 pp-in-roboticsTaku Yoshioka
 
20160611 pymc3-latent
20160611 pymc3-latent20160611 pymc3-latent
20160611 pymc3-latentTaku Yoshioka
 
自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介Taku Yoshioka
 

Plus de Taku Yoshioka (9)

20191123 bayes dl-jp
20191123 bayes dl-jp20191123 bayes dl-jp
20191123 bayes dl-jp
 
20191026 bayes dl
20191026 bayes dl20191026 bayes dl
20191026 bayes dl
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 
20181125 pybullet
20181125 pybullet20181125 pybullet
20181125 pybullet
 
20180722 pyro
20180722 pyro20180722 pyro
20180722 pyro
 
20171207 domain-adaptation
20171207 domain-adaptation20171207 domain-adaptation
20171207 domain-adaptation
 
20171025 pp-in-robotics
20171025 pp-in-robotics20171025 pp-in-robotics
20171025 pp-in-robotics
 
20160611 pymc3-latent
20160611 pymc3-latent20160611 pymc3-latent
20160611 pymc3-latent
 
自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介自動微分変分ベイズ法の紹介
自動微分変分ベイズ法の紹介
 

Dernier

A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationMohsinKhanA
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxJoseeMusabyimana
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptxMUKULKUMAR210
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfodunowoeminence2019
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Bahzad5
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS Bahzad5
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Apollo Techno Industries Pvt Ltd
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxNaveenVerma126
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvementVijayMuni2
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsYusuf Yıldız
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical SensorTanvir Moin
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabusViolet Violet
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Sean Meyn
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...sahb78428
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxYogeshKumarKJMIT
 

Dernier (20)

A Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software SimulationA Seminar on Electric Vehicle Software Simulation
A Seminar on Electric Vehicle Software Simulation
 
EPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptxEPE3163_Hydro power stations_Unit2_Lect2.pptx
EPE3163_Hydro power stations_Unit2_Lect2.pptx
 
Power System electrical and electronics .pptx
Power System electrical and electronics .pptxPower System electrical and electronics .pptx
Power System electrical and electronics .pptx
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)
 
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS GENERAL CONDITIONS  FOR  CONTRACTS OF CIVIL ENGINEERING WORKS
GENERAL CONDITIONS FOR CONTRACTS OF CIVIL ENGINEERING WORKS
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...Technology Features of Apollo HDD Machine, Its Technical Specification with C...
Technology Features of Apollo HDD Machine, Its Technical Specification with C...
 
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docxSUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
SUMMER TRAINING REPORT ON BUILDING CONSTRUCTION.docx
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
Landsman converter for power factor improvement
Landsman converter for power factor improvementLandsman converter for power factor improvement
Landsman converter for power factor improvement
 
Modelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovationsModelling Guide for Timber Structures - FPInnovations
Modelling Guide for Timber Structures - FPInnovations
 
Basic Principle of Electrochemical Sensor
Basic Principle of  Electrochemical SensorBasic Principle of  Electrochemical Sensor
Basic Principle of Electrochemical Sensor
 
cloud computing notes for anna university syllabus
cloud computing notes for anna university syllabuscloud computing notes for anna university syllabus
cloud computing notes for anna university syllabus
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
 
Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...Clutches and brkesSelect any 3 position random motion out of real world and d...
Clutches and brkesSelect any 3 position random motion out of real world and d...
 
Design of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptxDesign of Clutches and Brakes in Design of Machine Elements.pptx
Design of Clutches and Brakes in Design of Machine Elements.pptx
 

20181221 q-trader

  • 1. Reinforcement learning for trading with Python December 21, 2018 はんなりPython Taku Yoshioka
  • 2. • A previous work on github: q-trader • Q-learning • Trading model • State, action and reward • Implementation • Results
  • 4. edwardhdlu/q-trader • An implementation of Q-learning applied to (short- term) stock trading
  • 5. Action value function (Q-value) • Expected discounted cumlative future reward given the current state and action then following the optimal policy (mapping from state to action) Q(st, at) = E " X k=0 k rt+k | s = st, a = at # : discount factor t: time step st: state at: action rt: (immediate) reward
  • 6. Q-learning • An algorithm of Reinforcement learning for learning the optimal Q-value Qnew (st, at) = (1 ↵)Qold (st, at) + ↵(rt + max a Q(st+1, a)) ↵ 2 (0, 1): learning rate • For applying Q-learning, collect samples in episodes represented as tuples (st, at, rt, st+1)
  • 7. Deep Q-learning • Representing action value function with a deep network and minimizing loss function L = X t2D (Q(st, at) yt) 2 yt = rt + max a Q(st+1, a) Note • In contrast to supervised learning, the target value involves the current network outputs. Thus network parameters should be gradually updated • The order of samples in minibatches can be randomized to decorrelate sequential samples (replay buffer) D: minibatch
  • 8. Trading model • Based on closing prices • Three actions: buy (1 unit), sell (1 unit), sit • Ignoring transaction fee • Immediate transaction • Limited number of units we can keep
  • 9. State • State: n-step time window of the 1-step differences of past stock prices • Sigmoid function for input scaling issue st = (dt ⌧+1, · · · , dt 1, dt) dt = sigmoid (pt pt 1) pt: price at time t (discrete) ⌧: time window size (steps)
  • 10. Reward • Depending on action • Sit: 0 • Buy: a negative constant (configurable) • Sell: profit (sell price - bought price) This needs to be appropriately designed
  • 11. Implementation • Training (train.py) Initialization, sample collection, training loop • Environment (environment.py) Loading stock data, making state transition with reward • Agent (agent.py) Feature extraction, Q-learning, e-greedy exploration, saving model • Results examination (learning_curve.py, plot_learning_curve.py, evaluate.py) Compute learning curve for given data, plotting
  • 12. Configuration • For comparison of various experimental settings stock_name : ^GSPC window_size : 10 episode_count: 500 result_dir : ./models/config batch_size: 32 clip_reward : True reward_for_buy: -10 gamma : 0.995 learning_rate : 0.001 optimizer : Adam inventory_max : 10
  • 13. from ruamel.yaml import YAML with open(sys.argv[1]) as f: yaml = YAML() config = yaml.load(f) stock_name = config["stock_name"] window_size = config["window_size"] episode_count = config["episode_count"] result_dir = config["result_dir"] batch_size = config["batch_size"] Parsing config file • Get parameters as a dict
  • 14. stock_name, window_size, episode_count = sys.argv[1], int(sys.argv[2]), int( sys.argv[3]) batch_size = 32 agent = Agent(window_size) # Environment env = SimpleTradeEnv(stock_name, window_size, agent) train.py • Instantiate RL agent and environment for trading
  • 15. # Loop over episodes for e in range(episode_count + 1): # Initialization before starting an episode state = env.reset() agent.inventory = [] done = False # Loop in an episode while not done: action = agent.act(state) next_state, reward, done, _ = env.step(action) agent.memory.append((state, action, reward, next_state, done)) state = next_state if len(agent.memory) > batch_size: agent.expReplay(batch_size) if e % 10 == 0: agent.model.save("models/model_ep" + str(e)) • Loop over episodes • Collect samples and train the model (expReplay) • Train model every 10 episodes
  • 16. environment.py • Loading stock data class SimpleTradeEnv(object): def __init__(self, stock_name, window_size, agent, inventory_max, clip_reward=True, reward_for_buy=-20, print_trade=True): self.data = getStockDataVec(stock_name) self.window_size = window_size self.agent = agent self.print_trade = print_trade self.reward_for_buy = reward_for_buy self.clip_reward = clip_reward self.inventory_max = inventory_max
  • 17. • Computing reward for action and making state transition def step(self, action): # 0: Sit # 1: Buy # 2: Sell assert(action in (0, 1, 2)) # Reward if action == 0: reward = 0 elif action == 1: # Following slide else: if len(self.agent.inventory) > 0: # Following slide # State transition next_state = getState(self.data, self.t + 1, self.window_size + 1, self.agent) done = True if self.t == len(self.data) - 2 else False self.t += 1 return next_state, reward, done, {}
  • 18. • Reward for buy elif action == 1: if len(self.agent.inventory) < self.inventory_max: reward = self.reward_for_buy self.agent.inventory.append(self.data[self.t]) if self.print_trade: print("Buy: " + formatPrice(self.data[self.t])) else: reward = 0 if self.print_trade: print("Buy: not possible")
  • 19. • Reward for sell else: if len(self.agent.inventory) > 0: bought_price = self.agent.inventory.pop(0) profit = self.data[self.t] - bought_price reward = max(profit, 0) if self.clip_reward else profit self.total_profit += profit if self.print_trade: print("Sell: " + formatPrice(self.data[self.t]) + " | Profit: " + formatPrice(reward))
  • 20. # returns an an n-day state representation ending at time t def getState(data, t, n, agent): d = t - n + 1 block = data[d:t + 1] if d >= 0 else -d * [data[0]] + data[0:t + 1] # pad # with t0 res = [] for i in range(n - 1): res.append(sigmoid(block[i + 1] - block[i])) return agent.modify_state(np.array([res])) • Computing 1-step differences of time series of stock prices • Adding information if the agent has bought def modify_state(self, state): if len(self.inventory) > 0: state = np.hstack((state, [[1]])) else: state = np.hstack((state, [[0]])) return state (in agent.py)
  • 21. agent.py • Implements RL agent • Load pre-trained model when evaluation (is_eval) class Agent: def __init__(self, state_size, is_eval=False, model_name="", result_dir="", gamma=0.95, learning_rate=0.001, optimizer="Adam"): self.state_size = state_size # normalized previous days self.action_size = 3 # sit, buy, sell self.memory = deque(maxlen=1000) self.inventory = [] self.model_name = model_name self.is_eval = is_eval self.gamma = gamma self.epsilon = 1.0 self.epsilon_min = 0.01 self.epsilon_decay = 0.995 self.learning_rate = learning_rate self.optimizer = optimizer self.model = load_model(result_dir + "/" + model_name) if is_eval else self._model()
  • 22. def _model(self): model = Sequential() model.add(Dense(units=64, input_dim=self.state_size, activation="relu")) model.add(Dense(units=32, activation="relu")) model.add(Dense(units=8, activation="relu")) model.add(Dense(self.action_size, activation="linear")) model.compile(loss="mse", optimizer=Adam(lr=0.001)) return model def act(self, state): if not self.is_eval and np.random.rand() <= self.epsilon: return random.randrange(self.action_size) options = self.model.predict(state) return np.argmax(options[0]) • Implements action value function by a neural network • Accept state vector as input, output values for each action • Specify square loss and ADAM optimizer • Take argmax action or epsilon-greedy for exploration
  • 23. • Compute a target value with the current network output • Update parameters once (epochs=1) def expReplay(self, batch_size): subsamples = random.sample(list(self.memory), len(self.memory)) states, targets = [], [] for state, action, reward, next_state, done in subsamples: target = reward if not done: target = reward + self.gamma * np.amax(self.model.predict(next_state)[0]) target_f = self.model.predict(state) target_f[0][action] = target states.append(state) targets.append(target_f) self.model.fit(np.vstack(states), np.vstack(targets), epochs=1, verbose=0, batch_size=batch_size) if self.epsilon > self.epsilon_min: self.epsilon *= self.epsilon_decay
  • 24. Running scripts # Training with Q-learning $ python train.py config/config2.yaml • Training # Learning curve on training data $ python learning_curve.py config/config2.yaml ^GSPC • Learning curve
  • 25. Plotting learning curve # Plotting learning curve $ python plot_learning_curve.py config/config2.yaml ^GSPC • Specifying the data on which profits will be computed with the trained model
  • 26. Training data (^GSPC) Test data (^GSPC_2011) • Not working well: increasing profit on test data, but decreasing on training data • We usually see overfitting to training data
  • 27. Plotting trading behavior • Specifying the data and model (filename) # Plotting trading on a model $ python evaluate.py config/config2.yaml ^GSPC_2011 model_ep500