Hyperparameter Tuning Using W&B for FinRL Models
Deep Reinforcement Learning for Portfolio Allocation
Created on March 6|Last edited on August 25
Comment
Table of Contents (click to expand)
IntroductionGetting StartedDownload DataPerform Feature EngineeringBuild EnvironmentModel TrainingHyperparameter TuningTradingReferences
Introduction
Our objective is to design an automated trading solution for multiple stock trading. We model the portfolio management process as a Markov Decision Process (MDP).
Our trading goal is maximization. In the context of this example, we're maximizing the Sharpe ratio.
Getting Started
Libraries used:
- Model Training Framework: FinRL is an open-source deep reinforcement learning (DRL) framework for researchers and practitioners.
Download Data
The data of the Dow 30 constituents stocks that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.
- FinRL uses a YahooDownloader class to extract data.
- Download and save the data in a pandas DataFrame
data_df = YahooDownloader(start_date = '2008-01-01',
end_date = '2019-01-01',
ticker_list = dow_30_ticker).fetch_data()
Train & Trade Data Split:
- In real-life trading, the model needs to be updated periodically using rolling windows. In this article, we just cut the data once into the train and trade set.
Perform Feature Engineering
FinRL uses a FeatureEngineer class to preprocess data.
Build Environment
- Assume that we had $1,000,000 initial capital on 2019/01/01.
- Now, we'll build the user-defined environment to learn from.
- The components of the reinforcement learning environment are:
- State: S describes an agent’s perception of a market
- Action: A describes the allowed actions an agent can take at a state.
- Reward function: The reward function r is the incentive for an agent to learn a profitable policy.
Model Training
- FinRL uses a DRLAgent class to implement the algorithms (A2C in the context of this report).
- Algorithm used:
- We use A2C for portfolio allocation because it is stable, cost-effective, faster and works better with large batch sizes.
- A2C is a typical actor-critic algorithm that is introduced to improve the policy gradient updates.
- A2C is a great model for stock trading because of its stability.
- Feature: Parallel gradient updating.
# Model Training
model_a2c = agent.get_model(model_name="a2c", model_kwargs = wandb.config)
trained_a2c = agent.train_model(model=model_a2c,
tb_log_name='a2c',
total_timesteps=50000)
Version and store model reliably
Train a model and log it as an artifact. More details in our guide to using Artifacts for model versioning.
trained_model_artifact = wandb.Artifact('A2C', type = 'model', description = 'trained model for A2C model')
trained_model_artifact.add_dir(PATH_TO_MODEL_DIR + config.TRAINED_MODEL_DIR)
run.log_artifact(trained_model_artifact)
Hyperparameter Tuning
In A2C, we are tuning:
- lambda: discount factor for the sum of rewards
- ent_coef: entropy coefficient facilitates exploration
- learning_rate
Goal is to maximize sharpe_ratio here.
- Sharpe Ratio: It describes the excess return that the strategy can obtain under the total unit risk.
- Compute Sharpe ratio as follows:
# Calculate the Sharpe ratio# This is our objective for tuningdef calculate_sharpe(df):df['daily_return'] = df['account_value'].pct_change(1)if df['daily_return'].std() !=0:sharpe = (252**0.5)*df['daily_return'].mean()/ \df['daily_return'].std()return sharpeelse:return 0
3. Log sharpe_ratio as a top-level metric using wandb. For example:
# Calculate sharpe from the account valuesharpe = calculate_sharpe(df_account_value)wandb.log({'sharpe_ratio': sharpe})
4. Add sharpe_ratio as a metric and specify the goal as maximize in the sweep config.
sweep_config = {
"name" : "finrl-sweep",
"method" : "grid",
"metric": {
"name": "sharpe_ratio",
"goal": "maximize"
},
"parameters" : {
"ent_coef" : {
"distribution": "categorical",
"values": [0.0001, 0.001, 0.01, 0.1]
},
"n_steps" : {
"distribution": "categorical",
"values" : [5, 10, 15]
},
"learning_rate" :{
"distribution": "categorical",
"values": [0.00009, 0.0001, 0.001, 0.01, 0.1]
}
}
}
Sweep: finrl-sweep 1
20
Sweep: finrl-sweep 2
0
Trading
We use the A2C model to perform portfolio allocation of the Dow 30 stocks.
Sweep: finrl-sweep
20
Evaluating the performance of our trading strategy
There are many ways to evaluate and analyze an algorithm. While we already provide you with some of these measures like a cumulative returns plot in the Quantopian backtester, you may want to dive deeper into what your algorithm is doing. For example, you might want to look at how your portfolio allocation changes over time, or what your exposure to certain risk factors is.
At the core of pyfolio, we have tear sheets that summarize information about a backtest. Each tear sheet returns a number of plots, as well as other information, about a given topic. To generate all tear sheets at once, it's as simple as generating a backtest object and calling create_full_tear_sheet on it. This will show charts and analysis about returns of the single stock.
- The return parameter is required income data.
- benchmark_rets is the benchmark data.
pyfolio.create_full_tear_sheet(
returns=test_returns, benchmark_rets=baseline_returns, set_context=False
)
wandb.log({"create_full_tear_sheet": plt})
Run set
179
- The Returns plot above was created using pyfolio's plot_returns method. It plots raw returns over time. Backtest returns are in green, and out-of-sample (live trading) returns are in red.
axes = pyfolio.plotting.plot_returns(returns=test_returns)live_start_date = '2019-01-01')wandb.log({"Returns": axes.figure})
2. We also plotted the return quantiles plot for AAPL ticker for daily, weekly, and monthly return distributions. We used pyfolio's plot_return_quantiles method.
axes = pyfolio.plotting.plot_returns(returns=test_returns)live_start_date = '2019-01-01')wandb.log({"return_quantiles": axes.figure})
References
- Liu, Xiao-Yang et al. “FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance.” Capital Markets: Market Microstructure eJournal (2020): n. pag.
Add a comment