rl-algo-impls MicroRTS Training

Created on June 10|Last edited on June 10
Comment
Model used in colab_microrts_demo.ipynb﻿
Training curve of agent using sgoodfrieng/rl-algo-impls repo with Microrts-selfplay-dc-phases hyperparameters:
PPO training with phased rewards:
90M steps dense rewards using Farama-Foundation/MicroRTS-Py weights
60M steps: Half sparse WinLoss rewards, half dense rewards based off score function from Clemens Winter's CodeCraft (60M transition steps)
30M steps: Sparse WinLoss rewards only (60M transition steps)
Dense rewards used entropy coefficient 0.01. Sparse rewards used 0.001 entropy.
﻿DoubleCone(4, 6, 4) from FLG's Lux AI Season 2 solution
Self-play training with 6 environments of playing latest and 12 environments of playing models in the last 10 million steps
Trained with 6 maps:
16x16/basesWorkers16x16A.xml
16x16/TwoBasesBarracks16x16.xml
8x8/basesWorkers8x8A.xml
8x8/FourBasesWorkers8x8.xml
NoWhereToRun9x8.xml
16x16/EightBasesWorkers16x16.xml (Not public competition map)
﻿
Run set1
﻿
﻿
﻿
Run set1
﻿
﻿
﻿
Add a comment