Skip to main content

Insanely fast Lunar Lander with AWAC

Created on April 5|Last edited on April 6

Section 1



Insanely fast training using AWAC. Viable trajectories with only 1500 updates, 1000 of them from offline training data. Batch size 128. 1000 randomly selected transitions from a dataset of 100, 000 expert trajectories.

Possibly what helped here was drawing at random from such a large training set of good actions, then pursuing only limited fine tuning.

video
This run didn't log media for key "video", step 3009, index 0. Docs →
Run: sunny-rain-229
1
State
Notes
User
Tags
Created
Runtime
Sweep
batch_size
buffer_capacity
buffer_steps
debug
demo
device
discount
env_name
env_render
env_reward_bias
env_reward_scale
hidden_dim
hidden_size
lam
load_buffer
max_steps
optim_lr
precision
recency
run_dir
run_id
seed
silent
test_capture
test_episodes
test_samples
test_steps
best_mean_return
best_stdev_return
epi_len
epi_reward
global_step
kl_mean
kl_std
last_mean_return
last_stdev_return
offline_steps
test_len
test_mean_return
test_number
test_reward
Finished
duanenielsen
1m 38s
-
128
-
1000
true
false
cpu
0.99
LunarLander-v2
false
0
0.005
64
-
0.3
lander_big.pkl
1600
0.0001
torch.float32
1
runs/run_437
437
0
true
true
-
5
500
0.092869
0.51327
213
0.10875
1600
4.5424e-7
0.0000011374
0.092869
0.51327
104798
-
0.092869
2
-
1-1
of 1



Run: sunny-rain-229
1