Lux AI Season 2 Training
Created on April 25|Last edited on April 25
Comment
Submission Run: ppo-LuxAI_S2-v0-A10-S1-2023-04-23T01:22:36.160913
- PPO
- 20M steps
- Spike learning rate starting at 4e-6, linearly peaking at 4e-4 at 2M steps, and linearly dropping to 4e-8 at end
- Selfplay and historical opponents randomly chosen from a queue of 10M steps
- Reward varied linearly
- Started with rewards for actions taken:
- ice, water, ore, and metal generation
- penalty for factory loss
- Ended with rewards for game result and lichen generation
- U-Net-like model with value output using an AvgPool2D from the bottom of the U-net
- Invalid Action Masking
Two-Stage Training: Small Map + Transfer to Full Env
An interesting aside that wasn't as successful was to train the model on a simpler setting (a medium-size 32x32 map with 1 factory each) before training that model on the full environment:
Add a comment