Skip to main content

Classic Control: Our PPO vs openai/baselines' PPO

Created on January 3|Last edited on April 12


100k200k300k400k Steps100200300400500Episodic Return
Run set
12
Name
12 visualized
4
12
State
Notes
User
Tags
Created
Runtime
Sweep
alg
env
exp_name
network
num_env
num_timesteps
play
reward_scale
save_video_interval
save_video_length
seed
track
anneal_lr
batch_size
capture_video
clip_coef
clip_vloss
cuda
ent_coef
gae
gae_lambda
gamma
gym_id
learning_rate
max_grad_norm
minibatch_size
norm_adv
num_envs
num_minibatches
num_steps
torch_deterministic
total_timesteps
update_epochs
vf_coef
wandb_entity
wandb_project_name
aux_batch_size
aux_minibatch_size
beta_clone
e_auxiliary
e_policy
n_aux_grad_accum
n_aux_minibatch
n_iteration
Finished
costa-huang
11d 18h 42m 33s
-
ppo2
CartPole-v1
["baselines-ppo2-mlp","baselines-ppo2-mlp-seperate-networks","ppo","ppo_shared"]
mlp
4
500000
false
1
0
200
2
true
true
512
false
0.2
true
false
0.01
true
0.95
0.99
CartPole-v1
0.00025
0.5
128
true
4
4
128
true
500000
4
0.5
vwxyzjn
ppo-details
-
-
-
-
-
-
-
-
1-1
of 1




Run set
12



Run set
12