Skip to main content

MontezumaRevenge: CleanRL's PPO + RND

Created on August 24|Last edited on August 24

500M1G1.5GStep02000400060008000100001200014000Episodic Return
50100150200250Time (hours)02000400060008000100001200014000Episodic Return
CleanRL's ppo_rnd_envpool.py
1
Name
1 visualized
1
State
Notes
User
Tags
Created
Runtime
Sweep
adv_norm_fullbatch
alpha
anneal_lr
autotune
aux_batch_rollouts
backend
batch_size
beta_clone
buffer_size
capture_video
clip_coef
clip_vloss
cuda
device_ids
discount
e_auxiliary
e_policy
end_e
ent_coef
env
env_id
eval_every
eval_freq
exp_name
expl_noise
exploration_fraction
exploration_noise
ext_coef
gae
gae_lambda
gamma
int_coef
int_gamma
learning_rate
learning_starts
load_model
max_grad_norm
max_timesteps
minibatch_size
n_atoms
n_aux_grad_accum
n_eval_episodes
n_iteration
noise_clip
Finished
yooceii
10d 16h 51m 24s
-
-
-
true
-
-
-
16384
-
-
false
0.1
true
true
-
-
-
-
-
0.001
-
MontezumaRevenge-v5
-
-
ppo_rnd
-
-
-
2
true
0.95
0.999
1
0.99
0.0001
-
-
0.5
-
4096
-
-
-
-
-
1-1
of 1