Comment
reward for different reward_clip
reward for different reward_clip
avg_crash_reward for different reward clipping
avg_crash_reward for different reward clipping
Run set
8
Name
8 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
actor_critic_share_weights
actor_worker_gpus
adam_beta1
adam_beta2
adam_eps
adaptive_stddev
algo
anneal_collision_steps
async_rl
batch_size
batched_sampling
benchmark
cli_args.actor_critic_share_weights
cli_args.adaptive_stddev
cli_args.algo
cli_args.anneal_collision_steps
cli_args.async_rl
cli_args.batch_size
cli_args.device
cli_args.encoder_custom
cli_args.env
cli_args.experiment
cli_args.experiments_root
cli_args.exploration_loss_coeff
cli_args.gae_lambda
cli_args.hidden_size
cli_args.learning_rate
cli_args.max_grad_norm
cli_args.max_policy_lag
cli_args.neighbor_obs_type
cli_args.nonlinearity
cli_args.num_envs_per_worker
cli_args.num_workers
cli_args.policy_initialization
cli_args.ppo_clip_value
cli_args.quads_collision_falloff_radius
cli_args.quads_collision_hitbox_radius
cli_args.quads_collision_reward
cli_args.quads_collision_smooth_max_penalty
cli_args.quads_episode_duration
cli_args.quads_formation_size
cli_args.quads_local_coeff
Finished
-
andrewzhang505
runner
sf2
1d 48m 44s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
03_baseline_see_3333
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 30m 12s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
02_baseline_see_2222
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 1h 5m 36s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
01_baseline_see_1111
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 2h 38m 35s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
00_baseline_see_0
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 3h 5m 11s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
02_baseline_see_2222
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 3h 3m 41s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
03_baseline_see_3333
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 44m 48s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
01_baseline_see_1111
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
Finished
-
andrewzhang505
runner
sf2
1d 57m 42s
-
false
[]
0.9
0.999
0.000001
false
APPO
0
true
1024
false
false
false
false
APPO
0
-
1024
-
quad_multi_encoder
quadrotor_multi
00_baseline_see_0
-
0
1
16
0.0001
5
100000000
none
tanh
4
36
xavier_uniform
5
4
2
5
10
15
0
1
1-8
of 8
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/andrewzhang505/sample_factory/reports/Quad-Swarm-RL-Reward-Clipping--VmlldzoyMzA2NTQw