Lux AI Season 2 Training

Created on April 25|Last edited on April 25
Comment
Kaggle Notebook: https://www.kaggle.com/code/sgoodfriend/luxs2-rl-algo-impls-v0-0-12-rl-ppo-u-net﻿
Submission Run: ppo-LuxAI_S2-v0-A10-S1-2023-04-23T01:22:36.160913PPO
20M steps
Spike learning rate starting at 4e-6, linearly peaking at 4e-4 at 2M steps, and linearly dropping to 4e-8 at end
Selfplay and historical opponents randomly chosen from a queue of 10M steps
Reward varied linearly
Started with rewards for actions taken:
ice, water, ore, and metal generation
penalty for factory loss
Ended with rewards for game result and lichen generation
U-Net-like model with value output using an AvgPool2D from the bottom of the U-net
Invalid Action Masking
﻿
﻿
Run set1
﻿
﻿
Two-Stage Training: Small Map + Transfer to Full EnvAn interesting aside that wasn't as successful was to train the model on a simpler setting (a medium-size 32x32 map with 1 factory each) before training that model on the full environment:
﻿
Run set2
﻿
﻿
Add a comment