Skip to main content

MicroRTS U-net Self-play

Created on April 8|Last edited on May 16
Experiment tags:
  • Shaped rewards: benchmark_4706d8d host_192-9-146-21 branch_selfplay v0.0.9
  • Win-loss, gamma 0.999: benchmark_4706d8d host_152-70-115-196 branch_selfplay v0.0.9
  • Shaped rewards decay, gamma 0.999: benchmark_08664bf host_192-9-250-82 branch_selfplay v0.0.9
  • Shaped rewards decay, gamma decay 0.99-0.999: benchmark_f7c6f26 host_192-9-151-120 branch_selfplay v0.0.9
  • Shaped rewards decay, gamma decay, 4000 train max_steps, 300,000 save_steps, 6000 swap_steps: benchmark_9ba0ab5 host_192-9-155-233 branch_main v0.0.9

050M100M150M200M250Mglobal_step-0.500.51
env: Microrts-selfplay-unet-decay, microrts_reward_decay_callback: true, algo_hyperparams.gamma: 0.999, env_hyperparams.make_kwargs.max_steps: 2000
env: Microrts-selfplay-unet-decay, microrts_reward_decay_callback: true, algo_hyperparams.gamma: -, env_hyperparams.make_kwargs.max_steps: 4000
env: Microrts-selfplay-unet-decay, microrts_reward_decay_callback: true, algo_hyperparams.gamma: -, env_hyperparams.make_kwargs.max_steps: 2000
env: Microrts-selfplay-unet, microrts_reward_decay_callback: false, algo_hyperparams.gamma: -, env_hyperparams.make_kwargs.max_steps: 2000
env: Microrts-selfplay-unet-winloss, microrts_reward_decay_callback: false, algo_hyperparams.gamma: 0.999, env_hyperparams.make_kwargs.max_steps: 2000
Run set
15


Shaped rewards: benchmark_4706d8d host_192-9-146-21 branch_selfplay v0.0.9


Run set
3



Win-loss, gamma 0.999: benchmark_4706d8d host_152-70-115-196 branch_selfplay v0.0.9


Run set
3



Shaped rewards decay, gamma 0.999: benchmark_08664bf host_192-9-250-82 branch_selfplay v0.0.9


Run set
3


Shaped rewards decay, gamma decay 0.99-0.999: benchmark_f7c6f26 host_192-9-151-120 branch_selfplay v0.0.9


Run set
3


Shaped rewards decay, gamma decay, 4000 train max_steps, 300,000 save_steps, 6000 swap_steps: benchmark_9ba0ab5 host_192-9-155-233 branch_main v0.0.9


Run set
3



Run set
3