Lux S2: Action Masking and Prevention
Created on May 9|Last edited on May 16
Comment
ppo-LuxAI_S2-v0-dc-A10-S1-2023-05-16T20:32:44.935223
c2c3b49 Use sync vec env for Lux0402f24 Add metal remaining to assign as observation431bf31 30 million step Lux5bd3dbb gamma and gae_lambda no longer 1c117dba Recharge on an empty queue is a waste of power4faa853 max_move_repeats accounts for enqueuing power coste154efb Respect bid_std_dev setting0855050 Support for factory placement through Dict Space17f4706 Factor RolloutGenerator into top-levelc4f37eb selfPlayWrapper -> self_play_wrapper6ea561b RolloutGenerator and SyncStepRolloutGeneratorf57a6ef Support ent_coef in LuxHyperparamTransitions*0d1f0fd Add value_shape to ACN abstract class*4cc095f Metric vf_coef*d402515 [LuxS2] ent_coef by phases*2e65826 Double cone network (#25)51968fc Always bid 096d67d0 Fix cancel_action/move using the wrong unit3017445 score_threshold to determine best in eval* Also run in prior runs
- Multiple reward output, multiple critic values, support for these multiple output in PPO
Is the double-cone-network branch rebased off of the reward changes
This is trying out the new model architecture DoubleCone(4,6,4) used by the best performing RL solution in Lux Season 2. It uses a ton of memory so the minibatch size is dropped to 256 (about one-quarter of unet)
a06bc54 Double ore/metal reward; half ice/water rewardcec4edf repeat -> num_executions
9aa0457 Power gen is before step increment0c3f351 Expected power gen range3ec489c Revert "Revert "Add power generation verification""4158f3c Revert "Add power generation verification"1945472 Add power generation verificationf13f662 Tweak logginga281543 Some end-of-game checks
3380841 Compute transfer amount by target capacity8fb2397 Found transfers not being invalid masked away668a34e += modifies numpy arrays in place?53e32e8 Limit move repeat with expected power usagee14d2ff Limit repeats of all actions
ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:05.262620 (A10) & ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:16.168048 (RTX-6000, killed at 30M steps because 40% slower)
349332e Forgot importa508765 Update water pickup action for min amounte794435 Don’t pickup last 50 days of water7fb940f Lux S2 constant learning rate502e1b4 Tweak debug hyperparams668ca91 First factories get 2 heavy robots
Add a comment