Skip to main content

Lux S2: Action Masking and Prevention

Created on May 9|Last edited on May 16

20M40M60M80M100Mglobal_step02000400060008000100001200014000
20M40M60M80M100Mglobal_step0100002000030000400005000060000
Run set
8

ppo-LuxAI_S2-v0-dc-A10-S1-2023-05-16T20:32:44.935223
c2c3b49 Use sync vec env for Lux
0402f24 Add metal remaining to assign as observation
431bf31 30 million step Lux
5bd3dbb gamma and gae_lambda no longer 1
c117dba Recharge on an empty queue is a waste of power
4faa853 max_move_repeats accounts for enqueuing power cost
e154efb Respect bid_std_dev setting
0855050 Support for factory placement through Dict Space
17f4706 Factor RolloutGenerator into top-level
c4f37eb selfPlayWrapper -> self_play_wrapper
6ea561b RolloutGenerator and SyncStepRolloutGenerator
f57a6ef Support ent_coef in LuxHyperparamTransitions
*0d1f0fd Add value_shape to ACN abstract class
*4cc095f Metric vf_coef
*d402515 [LuxS2] ent_coef by phases
*2e65826 Double cone network (#25)
51968fc Always bid 0
96d67d0 Fix cancel_action/move using the wrong unit
3017445 score_threshold to determine best in eval
* Also run in prior runs
  • Multiple reward output, multiple critic values, support for these multiple output in PPO
Is the double-cone-network branch rebased off of the reward changes
This is trying out the new model architecture DoubleCone(4,6,4) used by the best performing RL solution in Lux Season 2. It uses a ton of memory so the minibatch size is dropped to 256 (about one-quarter of unet)
a06bc54 Double ore/metal reward; half ice/water reward
cec4edf repeat -> num_executions
9aa0457 Power gen is before step increment
0c3f351 Expected power gen range
3ec489c Revert "Revert "Add power generation verification""
4158f3c Revert "Add power generation verification"
1945472 Add power generation verification
f13f662 Tweak logging
a281543 Some end-of-game checks
3380841 Compute transfer amount by target capacity
8fb2397 Found transfers not being invalid masked away
668a34e += modifies numpy arrays in place?
53e32e8 Limit move repeat with expected power usage
e14d2ff Limit repeats of all actions
349332e Forgot import
a508765 Update water pickup action for min amount
e794435 Don’t pickup last 50 days of water
7fb940f Lux S2 constant learning rate
502e1b4 Tweak debug hyperparams
668ca91 First factories get 2 heavy robots