LUX follow-up rl-algo-impls v0.0.13
Created on April 26|Last edited on May 10
Comment
9aa0457 Power gen is before step increment0c3f351 Expected power gen range3ec489c Revert "Revert "Add power generation verification""4158f3c Revert "Add power generation verification"1945472 Add power generation verificationf13f662 Tweak logginga281543 Some end-of-game checks
3380841 Compute transfer amount by target capacity8fb2397 Found transfers not being invalid masked away668a34e += modifies numpy arrays in place?53e32e8 Limit move repeat with expected power usagee14d2ff Limit repeats of all actions
- ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:05.262620 (A10) & ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:16.168048 (RTX 6000)
349332e Forgot importa508765 Update water pickup action for min amounte794435 Don’t pickup last 50 days of water7fb940f Lux S2 constant learning rate502e1b4 Tweak debug hyperparams668ca91 First factories get 2 heavy robots
A breakthrough. The secret was to stop letting robots pick up enough water to insta-kill the factory. This gave robots time to roam around and collect resources.
RTX 6000 is about 60% as fast as the A10 while costing 86% as much, so keep with the A10.
- ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T18:50:18.120939 (killed at 46M steps)
c861407 Ice/ore rubble clearing are “accumulation stats”825ae8b Add rubble cleared off ice/ore reward metrics
This doesn't seem to have helped measurably
- ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T17:51:16.726649 (killed at 45M steps)
40055ac Disable the “embed” layer
Back to the ~2000 ice wall
- ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-04T01:00:18.919432 (killed at 54M steps):
1f4b699 Don’t check transfer action if there is no valid actiond0f9e1b Move Lux files to its own top-level directory8194823 embed_layer option adds a 1x1 convolution to input7626593 Fix setting reward_weights in async4671c81 Remove move, transfer, and builds that lead to destruction0fd7ad3 Set the unwrapped env, not just check if unwrapped has reward_weightsc8beb11 p -> playerc526710 Linear decay of entropycfeac11 Up version for training phases and other tweaks313f227 Set the unwrapped reward_weights during phase setting
The embed layer did not work at all
- ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-03T06:20:01.531121: https://github.com/sgoodfriend/rl-algo-impls/commit/4860e66bcfed536230a9528050ab1ed16f381d5f
4860e66 Strings to constants in LuxHyperparamTransitions40cd687 Fixed reward_weights not being set on the LuxEnvGridnet7ebe747 Add logs for LuxHyperparamTransitionsfcf4bd4 Add gae_lambda and first_reward_weight metrics29bcba4 Use extremely short timeframe to test transitions
This model failed because reward_weights weren't being set properly.
- ppo-LuxAI_S2-v0-A10-S1-2023-04-28T08:43:12.754569 & ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-01T08:09:40.050141: https://github.com/sgoodfriend/rl-algo-impls/commit/8293c9ecc221d73ab154d6c259dfb0317793d737
8293c9e Half learning rate to prevent >0.01 KL-divergence
This was surprisingly disappointing as the agent hit the ~2000 ice/~300 length wall
- ppo-LuxAI_S2-v0-A10-S1-2023-04-28T00:33:43.295061 & ppo-LuxAI_S2-v0-A10-100M-S1-2023-04-28T00:22:38.891889: https://github.com/sgoodfriend/rl-algo-impls/commit/e3365fe108f660547b6e5467caee614c44c79833
e3365fe Autoformatd69cee1 Fix transition progress calculation1e9cb04 Add option to always record video during eval92269f5 LuxHyperparamTransitions allows multi-phase transitionsa67d80b Cleanup RewardDecayCallback
This is the first reproduction of a full-length game survival agent. The big change is the addition of the LuxHyperparamTransitions (however, the reward_weights were not being updated)
- ppo-LuxAI_S2-v0-A10-S1-2023-04-26T23:47:00.718479: https://github.com/sgoodfriend/rl-algo-impls/commit/999f95a178f135d2b0b338f990672b09e86e9d81
999f95a Add collision possibility and factory tile to observation
- ppo-LuxAI_S2-v0-A10-S1-2023-04-26T15:51:16.915466: https://github.com/sgoodfriend/rl-algo-impls/commit/29268d67e02898446b299770be1fc536e9b1e490
29268d6 Remove in-game lichen as a reward
- ppo-LuxAI_S2-v0-A10-S1-2023-04-26T15:49:16.282556: https://github.com/sgoodfriend/rl-algo-impls/commit/15a543cec04fb2e283331e060d09e3c4723be239
15a543c Fix the move_validity_map for transfers
- ppo-LuxAI_S2-v0-A10-S1-2023-04-26T00:30:10.724924: https://github.com/sgoodfriend/rl-algo-impls/commit/65b727341a3d824deb98312f5127b70eeddaa19b
65b7273 Actually update the reward_weight for eval
Stopped ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T01:26:41.781321 because it's just https://github.com/sgoodfriend/rl-algo-impls/commit/1f4b6994e13946b609d7808ecfb5dcf0dc970b7c (already running) with the sync rollout refactor.
Add a comment