Hyperparameter Tuning Using W&B for FinRL Models

Deep Reinforcement Learning for Portfolio Allocation
Created on March 6|Last edited on August 25
Comment
﻿
Table of Contents (click to expand)IntroductionGetting StartedDownload DataPerform Feature EngineeringBuild EnvironmentModel TrainingHyperparameter TuningTradingReferences
﻿
IntroductionOur objective is to design an automated trading solution for multiple stock trading. We model the portfolio management process as a Markov Decision Process (MDP).
Our trading goal is maximization. In the context of this example, we're maximizing the Sharpe ratio.
Getting StartedLibraries used:
Model Training Framework: FinRL is an open-source deep reinforcement learning (DRL) framework for researchers and practitioners.
Experiment Tracking: Weight & Biases﻿
Download DataThe data of the Dow 30 constituents stocks that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.
FinRL uses a YahooDownloader class to extract data.
Download and save the data in a pandas DataFrame
data_df = YahooDownloader(start_date = '2008-01-01',
                          end_date = '2019-01-01',
                          ticker_list = dow_30_ticker).fetch_data()
﻿
Train & Trade Data Split:
In real-life trading, the model needs to be updated periodically using rolling windows. In this article, we just cut the data once into the train and trade set.
Perform Feature EngineeringFinRL uses a FeatureEngineer class to preprocess data.
Build EnvironmentAssume that we had $1,000,000 initial capital on 2019/01/01.
Now, we'll build the user-defined environment to learn from.
The components of the reinforcement learning environment are:
State: S describes an agent’s perception of a market
Action: A describes the allowed actions an agent can take at a state.
Reward function: The reward function r is the incentive for an agent to learn a profitable policy.
Model TrainingFinRL uses a DRLAgent class to implement the algorithms (A2C in the context of this report).
Algorithm used: 
We use A2C for portfolio allocation because it is stable, cost-effective, faster and works better with large batch sizes.
A2C is a typical actor-critic algorithm that is introduced to improve the policy gradient updates. 
A2C is a great model for stock trading because of its stability.
Feature: Parallel gradient updating.
# Model Training
model_a2c = agent.get_model(model_name="a2c", model_kwargs = wandb.config)
trained_a2c = agent.train_model(model=model_a2c,
                                tb_log_name='a2c',
                                total_timesteps=50000)
﻿
Version and store model reliablyTrain a model and log it as an artifact. More details in our guide to using Artifacts for model versioning.
trained_model_artifact = wandb.Artifact('A2C', type = 'model', description = 'trained model for A2C model')
trained_model_artifact.add_dir(PATH_TO_MODEL_DIR + config.TRAINED_MODEL_DIR)
run.log_artifact(trained_model_artifact)
﻿
Hyperparameter TuningIn A2C, we are tuning:
lambda: discount factor for the sum of rewards
ent_coef: entropy coefficient facilitates exploration
learning_rate
Goal is to maximize sharpe_ratio here.
Sharpe Ratio: It describes the excess return that the strategy can obtain under the total unit risk.
Compute Sharpe ratio as follows:
# Calculate the Sharpe ratio
# This is our objective for tuning
def calculate_sharpe(df):
  df['daily_return'] = df['account_value'].pct_change(1)
  if df['daily_return'].std() !=0:
    sharpe = (252**0.5)*df['daily_return'].mean()/ \
          df['daily_return'].std()
    return sharpe
  else:
    return 0
3. Log sharpe_ratio as a top-level metric using wandb. For example:
# Calculate sharpe from the account value
sharpe = calculate_sharpe(df_account_value)
wandb.log({'sharpe_ratio': sharpe})
4. Add sharpe_ratio as a metric and specify the goal as maximize in the sweep config.
sweep_config = {
  "name" : "finrl-sweep",
  "method" : "grid",
  "metric": {
    "name": "sharpe_ratio",
    "goal": "maximize"  
  },
  "parameters" : {
    "ent_coef" : {
      "distribution": "categorical",
      "values": [0.0001, 0.001, 0.01, 0.1]
    },
    "n_steps" : {
      "distribution": "categorical",
      "values" : [5, 10, 15]
    },
    "learning_rate" :{
      "distribution": "categorical",
      "values": [0.00009, 0.0001, 0.001, 0.01, 0.1]
    }
  }
}
﻿
﻿
﻿
Sweep: finrl-sweep 120
Sweep: finrl-sweep 20
﻿
﻿
TradingWe use the A2C model to perform portfolio allocation of the Dow 30 stocks.
﻿
Sweep: finrl-sweep20
﻿
Evaluating the performance of our trading strategyThere are many ways to evaluate and analyze an algorithm. While we already provide you with some of these measures like a cumulative returns plot in the Quantopian backtester, you may want to dive deeper into what your algorithm is doing. For example, you might want to look at how your portfolio allocation changes over time, or what your exposure to certain risk factors is.
At the core of pyfolio, we have tear sheets that summarize information about a backtest. Each tear sheet returns a number of plots, as well as other information, about a given topic. To generate all tear sheets at once, it's as simple as generating a backtest object and calling create_full_tear_sheet on it. This will show charts and analysis about returns of the single stock.
- The return parameter is required income data. 
- benchmark_rets is the benchmark data.
pyfolio.create_full_tear_sheet(
            returns=test_returns, benchmark_rets=baseline_returns, set_context=False
        )
wandb.log({"create_full_tear_sheet": plt})
﻿
﻿
﻿
Run set179
﻿
The Returns plot above was created using pyfolio's plot_returns method. It plots raw returns over time. Backtest returns are in green, and out-of-sample (live trading) returns are in red.
axes = pyfolio.plotting.plot_returns(returns=test_returns)
                                       live_start_date = '2019-01-01')
wandb.log({"Returns": axes.figure})
2. We also plotted the return quantiles plot for AAPL ticker for daily, weekly, and monthly return distributions. We used pyfolio's plot_return_quantiles method.
axes = pyfolio.plotting.plot_returns(returns=test_returns)
                                       live_start_date = '2019-01-01')
wandb.log({"return_quantiles": axes.figure})
ReferencesLiu, Xiao-Yang et al. “FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.” Capital Markets: Market Microstructure eJournal (2020): n. pag.
Quantopian: Pyfolio: A toolkit for portfolio and risk analytics in Python, 2019.
﻿
Add a comment