rl-algo-impls MicroRTS Training
Created on June 10|Last edited on June 10
Comment
Training curve of agent using sgoodfrieng/rl-algo-impls repo with Microrts-selfplay-dc-phases hyperparameters:
- PPO training with phased rewards:
- 60M steps: Half sparse WinLoss rewards, half dense rewards based off score function from Clemens Winter's CodeCraft (60M transition steps)
- 30M steps: Sparse WinLoss rewards only (60M transition steps)
- Dense rewards used entropy coefficient 0.01. Sparse rewards used 0.001 entropy.
- Self-play training with 6 environments of playing latest and 12 environments of playing models in the last 10 million steps
- Trained with 6 maps:
- 16x16/basesWorkers16x16A.xml
- 16x16/TwoBasesBarracks16x16.xml
- 8x8/basesWorkers8x8A.xml
- 8x8/FourBasesWorkers8x8.xml
- NoWhereToRun9x8.xml
- 16x16/EightBasesWorkers16x16.xml (Not public competition map)
Add a comment