Rickstaa's workspace
Runs
1,918
Name
1,918 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
ac_kwargs.activation
ac_kwargs.activation.actor
ac_kwargs.activation.critic
ac_kwargs.hidden_sizes.actor
ac_kwargs.hidden_sizes.critic
ac_kwargs.output_activation
ac_kwargs.output_activation.actor
ac_kwargs.output_activation.critic
actor_critic
adaptive_temperature
alpha
alpha3
batch_size
device
env
env_class
epochs
exp_name
export
gamma
horizon_length
labda
lr_a
lr_a_final
lr_c
lr_c_final
lr_decay_ref
lr_decay_type
max_ep_len
num_test_episodes
opt_type
polyak
replay_size
save_freq
seed
start_steps
steps_per_epoch
steps_per_update
update_after
update_every
Finished
-
rickstaa
5mo 27d 3h 45m 50s
-
-
nn.ReLU
nn.ReLU
256
121.14286
-
nn.ReLU
-
-
true
1
-
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
223.71429
["han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp_lac_critic","han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp_lac_critic_big","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp_lac_critic","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp_lac_critic_big","han2020_reproduction_sac_oscillator_alpha3_tune_exp_lac_critic","han2020_reproduction_sac_oscillator_complicated_alpha3_tune_exp_lac_critic"]
false
0.995
-
-
0.0001
4.5714e-10
0.0003
1.3714e-9
step
linear
271.42857
10
minimize
0.995
1000000
10
26203.8
0
2048
50
1000
100
Finished
-
rickstaa
35m 9s
-
-
nn.ReLU
nn.ReLU
256
256
-
nn.ReLU
-
-
true
2
-
256
gpu:1
OscillatorComplicated-v1
stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated
98
han2020_reproduction_sac_oscillator_complicated_alpha3_tune_exp_bigger_initial_alpha
false
0.995
-
-
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
26203.8
0
2048
50
1000
100
Finished
-
rickstaa
26m 20s
-
-
nn.ReLU
nn.ReLU
256
256
-
nn.ReLU
-
-
true
1
-
256
gpu:1
Oscillator-v1
stable_gym.envs.biological.oscillator.oscillator.Oscillator
49
han2020_reproduction_sac_oscillator_alpha3_tune_exp_bigger_steps_per_update
false
0.995
-
-
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
-
rickstaa
3d 12h 22m 43s
-
-
nn.ReLU
nn.ReLU
256
99.2
-
nn.ReLU
-
-
true
2
-
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
186
["han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp_sac_extra_all","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp_sac_extra_all","han2020_reproduction_sac_oscillator_alpha3_tune_exp_sac_extra_all","han2020_reproduction_sac_oscillator_complicated_alpha3_tune_exp_sac_extra_all"]
false
0.995
-
-
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
-
rickstaa
3d 13h 1m 3s
-
-
nn.ReLU
nn.ReLU
256
256
-
nn.ReLU
-
-
true
1
-
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
220.25
["han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp_different_steps_per_update","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp_different_steps_per_update","han2020_reproduction_sac_oscillator_complicated_alpha3_tune_exp_lac_critic_different_steps_per_update"]
false
0.995
-
-
0.0001
4.4167e-10
0.0003
1.3250e-9
step
linear
262.5
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
-
rickstaa
2d 14h 59m 21s
-
-
nn.ReLU
nn.ReLU
256
256
-
nn.ReLU
-
-
true
2
-
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
208
["han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp_bigger_initial_alpha","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp_bigger_initial_alpha","han2020_reproduction_sac_oscillator_alpha3_tune_exp_bigger_initial_alpha"]
false
0.995
-
-
0.0001
4.4167e-10
0.0003
1.3250e-9
step
linear
262.5
10
minimize
0.995
1000000
10
26203.8
0
2048
50
1000
100
Finished
-
rickstaa
2d 5h 40m 55s
-
-
nn.ReLU
nn.ReLU
256
256
-
nn.ReLU
-
-
true
1
-
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
186
["han2020_reproduction_sac_cartpole_cost_alpha3_tune_exp","han2020_reproduction_sac_fetch_reach_alpha3_tune_exp","han2020_reproduction_sac_fetch_reach_alpha3_tune_infinite_horizon_exp","han2020_reproduction_sac_oscillator_alpha3_tune_exp","han2020_reproduction_sac_oscillator_complicated_alpha3_tune_exp"]
false
0.995
-
-
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
50
1000
100
Finished
An extra experiment to see what the effect is of the smaller horizon size found in Han et al.'s codebase
.
rickstaa
extra
16h 39m 20s
-
-
nn.ReLU
nn.ReLU
256
112
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
195.75
["han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_small_horizon_alp0-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_small_horizon_alp0-3","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_small_horizon_alp1-0","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_horizon_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_horizon_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_horizon_alp1-0","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_horizon_alp0-1","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_horizon_alp0-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_horizon_alp1-0","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_horizon_alp0-1"]
false
0.99
2
0.99
0.0001
6.0833e-10
0.0003
1.8250e-9
step
linear
312.5
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
An extra experiment to see what the effect is of the lower lambda learning rate found in Han et al.'s codebase (1e-4 vs 3e-4).
rickstaa
extra
22h 27m 55s
-
-
nn.ReLU
nn.ReLU
256
99.2
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
186
["han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_lambda_lr_check_alp1-0","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp1-0","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp1-0"]
false
0.991
4
0.99
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
An extra experiment to check the effect of the smaller actor found in Han et al's codebase.
rickstaa
extra
18h 37m 48s
-
-
nn.ReLU
nn.ReLU
64
99.2
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
186
["han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_actor_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_actor_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_small_actor_alp1-0","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_small_actor_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_small_actor_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_small_actor_alp1-0","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_actor_alp0-1","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_actor_alp0-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_small_actor_alp1-0","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_actor_alp0-1"]
false
0.991
4
0.99
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
Here we decreased the lambda learning rate from 3e-4 specified in Han et al's paper to 1e-4 specified in their codebase. We unfortunately set the wrong lambda final learning rate when changing the lambda learning rate.
rickstaa
20h 11m 20s
-
-
nn.ReLU
nn.ReLU
256
99.2
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:0
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
186
["han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_lambda_lr_check_alp1-0","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_lambda_lr_check_alp1-0","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp0-1","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp0-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_lambda_lr_check_alp1-0","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_lr_lambda_check_alp0-3"]
false
0.991
4
0.99
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
A double check to see if our earlier experiment into a longer CompOscillator training was performed correctly.
rickstaa
extra
12h 58m 52s
-
-
nn.ReLU
nn.ReLU
256
176
-
nn.ReLU
-
-
true
2
0.8
256
gpu:1
OscillatorComplicated-v1
stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated
98
["han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-1","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-2","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-4","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-5","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-7","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp0-8","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp1-1","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp1-2","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp1-4","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_longer_alp1-5"]
false
0.99
5
0.99
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
An extra experiment to see what the effect is of the smaller critic found in Han et al.'s codebase.
rickstaa
extra
16h 49m 28s
-
-
nn.ReLU
nn.ReLU
256
48
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:1
OscillatorComplicated-v1
stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated
98
["han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp0-1","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp0-3","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp1-0"]
false
0.99
5
0.99
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
34912.5
0
2048
80
1000
100
Finished
Here we decreased the critic network size of the CompOscillator environment to [64,64,16] to investigate an inconsistency in Han et al.'s research. This is the short version where only 1e5 environment interactions were used per training iteration.
rickstaa
extra
incorrect
1h 14m 42s
-
-
nn.ReLU
nn.ReLU
256
48
-
nn.ReLU
-
-
true
2
0.46667
256
gpu:1
OscillatorComplicated-v1
stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated
49
["han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp0-1","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp0-3","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_small_critic_alp1-0"]
false
0.99
5
0.99
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
Here we added 5 extra seeds to the Oscillator and CompOscillator training performed in the alph3 hyperparameter tuning of our reproduction study. The first version of the CompOscillator, labled with 'short', training uses to little environment interactions 1e5 vs 2e5. The second version uses the correct number.
rickstaa
extra
3d 6h 25m 28s
-
-
nn.ReLU
nn.ReLU
256
176
-
nn.ReLU
-
-
true
2
0.8
256
gpu:1
["Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated"]
65.33333
["han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp0-2","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp0-3","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp0-4","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp0-8","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-0","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-1","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-2","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-3","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-4","han2020_reproduction_lac_oscillator_complicated_alpha3_tune_exp_alp1-5"]
false
0.99
5
0.99
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
346296.6
0
2048
80
1000
100
Finished
Here, we increased the total training steps used for the Oscillator and CompOscillator training from 1e5 to 2e5 in the alpha3 hyperparameter tuning of our reproduction study.
rickstaa
extra
19h 39m 20s
-
-
nn.ReLU
nn.ReLU
256
176
-
nn.ReLU
-
-
true
2
0.8
256
gpu:1
["Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated"]
98
["han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-2","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-4","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-6","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-7","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp0-9","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp1-0","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp1-2","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp1-3","han2020_reproduction_lac_oscillator_alpha3_tune_exp_extra_long_alp1-4"]
false
0.99
5
0.99
0.0001
1.0000e-9
0.0003
3.0000e-9
step
linear
400
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
The experiments of the alpha3 hyperparameter tuning performed in our reproduction study. One experiment, the CompOscillator, was later replaced since the step size mentioned in Han et al.'s paper was inconsistent, and we later decided to increase it.
rickstaa
3d 4h 10m 22s
-
-
nn.ReLU
nn.ReLU
256
99.2
-
nn.ReLU
-
-
true
2
0.8
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
176.2
["han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-4","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-5","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-6","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-7","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp0-9","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp1-0","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp1-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp1-2","han2020_reproduction_lac_fetch_reach_alpha3_tune_infinite_horizon_exp_alp1-3"]
false
0.991
4
0.99
0.0001
5.5333e-10
0.0003
1.6600e-9
step
linear
290
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
A small pilot study where we set use a constant 3e-4 lambda learning rate and decay the other learning rates to 1e-10.
rickstaa
2d 8h 27m 57s
-
-
nn.ReLU
nn.ReLU
256
112
-
nn.ReLU
-
-
true
2
0.8
256
gpu:1
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
183.5
["han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-2","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-6","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-8","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-9","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-2","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-3","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-4","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-5"]
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
312.5
10
minimize
0.995
1000000
10
26203.8
0
2048
80
1000
100
Finished
A pilot study where we check how the algorithm behaves when we let the lambda learning rate decay linearly and set it equal to the actor learning rate.
rickstaa
9d 13h 33m 30s
-
-
nn.ReLU
nn.ReLU
256
110
-
nn.ReLU
-
-
true
2
0.795
256
["cpu","gpu","gpu:1"]
["CartPoleCost-v1","FetchReachCost-v1","Oscillator-v1","OscillatorComplicated-v1"]
["stable_gym.envs.biological.oscillator.oscillator.Oscillator","stable_gym.envs.biological.oscillator_complicated.oscillator_complicated.OscillatorComplicated","stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost","stable_gym.envs.robotics.fetch.fetch_reach_cost.fetch_reach_cost.FetchReachCost"]
195.71875
["han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-2","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-3","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-4","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-5","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_alp1-1","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_alp1-2","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_alp1-3","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_alp1-4","han2020_reproduction_lac_fetch_reach_alpha3_tune_exp_alp1-5"]
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
310.9375
10
minimize
0.995
1000000
10
25085.41875
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 41m 30s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.5
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-5
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
234
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 43m 28s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.5
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-5
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
48104
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 42m 46s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.2
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-2
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
3658
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 43m 2s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.2
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-2
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
78456
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 42m 32s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.3
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-3
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
567
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 47m 58s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.4
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-4
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
234
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 46m 37s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.4
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-4
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
48104
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 46m 34s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.3
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-3
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
78456
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 47m 57s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.4
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-4
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
3658
0
2048
80
1000
100
Finished
-
rickstaa
cartpole
pilot
3h 46m 5s
-
-
nn.ReLU
nn.ReLU
[256,256]
[64,64,16]
-
nn.ReLU
-
-
true
2
1.5
256
gpu:1
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-5
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
567
0
2048
80
1000
100
Finished
A small pilot study to check how step based learning rate decay works on the GPU.
rickstaa
pilot
18h 19m 48s
-
-
nn.ReLU
nn.ReLU
256
48
-
nn.ReLU
-
-
true
2
0.55
256
gpu
CartPoleCost-v1
stable_gym.envs.classic_control.cartpole_cost.cartpole_cost.CartPoleCost
489
["han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-1","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-2","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-3","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-4","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-5","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-6","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-7","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-8","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp0-9","han2020_reproduction_lac_cartpole_cost_alpha3_tune_exp_alp1-0"]
false
0.99
5
0.99
0.0001
1.0000e-10
0.0003
1.0000e-10
step
linear
250
10
minimize
0.995
1000000
10
48104
0
2048
80
1000
100
1-20
of 21