Skip to main content

Lux AI Season 2 Training

Created on April 25|Last edited on April 25

Submission Run: ppo-LuxAI_S2-v0-A10-S1-2023-04-23T01:22:36.160913

  • PPO
  • 20M steps
    • Spike learning rate starting at 4e-6, linearly peaking at 4e-4 at 2M steps, and linearly dropping to 4e-8 at end
  • Selfplay and historical opponents randomly chosen from a queue of 10M steps
  • Reward varied linearly
    • Started with rewards for actions taken:
      • ice, water, ore, and metal generation
      • penalty for factory loss
    • Ended with rewards for game result and lichen generation
  • U-Net-like model with value output using an AvgPool2D from the bottom of the U-net
  • Invalid Action Masking


Run set
1




Two-Stage Training: Small Map + Transfer to Full Env

An interesting aside that wasn't as successful was to train the model on a simpler setting (a medium-size 32x32 map with 1 factory each) before training that model on the full environment:

Run set
2