Skip to main content

LUX follow-up rl-algo-impls v0.0.13

Created on April 26|Last edited on May 10

20M40M60M80M100Mglobal_step1
20M40M60M80M100Mglobal_step0100002000030000400005000060000
Run set
9

9aa0457 Power gen is before step increment
0c3f351 Expected power gen range
3ec489c Revert "Revert "Add power generation verification""
4158f3c Revert "Add power generation verification"
1945472 Add power generation verification
f13f662 Tweak logging
a281543 Some end-of-game checks
3380841 Compute transfer amount by target capacity
8fb2397 Found transfers not being invalid masked away
668a34e += modifies numpy arrays in place?
53e32e8 Limit move repeat with expected power usage
e14d2ff Limit repeats of all actions
  • ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:05.262620 (A10) & ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-08T20:03:16.168048 (RTX 6000)
349332e Forgot import
a508765 Update water pickup action for min amount
e794435 Don’t pickup last 50 days of water
7fb940f Lux S2 constant learning rate
502e1b4 Tweak debug hyperparams
668ca91 First factories get 2 heavy robots
A breakthrough. The secret was to stop letting robots pick up enough water to insta-kill the factory. This gave robots time to roam around and collect resources.
RTX 6000 is about 60% as fast as the A10 while costing 86% as much, so keep with the A10.
  • ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T18:50:18.120939 (killed at 46M steps)
c861407 Ice/ore rubble clearing are “accumulation stats”
825ae8b Add rubble cleared off ice/ore reward metrics
This doesn't seem to have helped measurably
  • ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T17:51:16.726649 (killed at 45M steps)
40055ac Disable the “embed” layer
Back to the ~2000 ice wall
  • ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-04T01:00:18.919432 (killed at 54M steps):
1f4b699 Don’t check transfer action if there is no valid action
d0f9e1b Move Lux files to its own top-level directory
8194823 embed_layer option adds a 1x1 convolution to input
7626593 Fix setting reward_weights in async
4671c81 Remove move, transfer, and builds that lead to destruction
0fd7ad3 Set the unwrapped env, not just check if unwrapped has reward_weights
c8beb11 p -> player
c526710 Linear decay of entropy
cfeac11 Up version for training phases and other tweaks
313f227 Set the unwrapped reward_weights during phase setting
The embed layer did not work at all
4860e66 Strings to constants in LuxHyperparamTransitions
40cd687 Fixed reward_weights not being set on the LuxEnvGridnet
7ebe747 Add logs for LuxHyperparamTransitions
fcf4bd4 Add gae_lambda and first_reward_weight metrics
29bcba4 Use extremely short timeframe to test transitions
This model failed because reward_weights weren't being set properly.
8293c9e Half learning rate to prevent >0.01 KL-divergence
This was surprisingly disappointing as the agent hit the ~2000 ice/~300 length wall
e3365fe Autoformat
d69cee1 Fix transition progress calculation
1e9cb04 Add option to always record video during eval
92269f5 LuxHyperparamTransitions allows multi-phase transitions
a67d80b Cleanup RewardDecayCallback
This is the first reproduction of a full-length game survival agent. The big change is the addition of the LuxHyperparamTransitions (however, the reward_weights were not being updated)
  • ppo-LuxAI_S2-v0-A10-S1-2023-04-26T23:47:00.718479: https://github.com/sgoodfriend/rl-algo-impls/commit/999f95a178f135d2b0b338f990672b09e86e9d81
999f95a Add collision possibility and factory tile to observation
29268d6 Remove in-game lichen as a reward
15a543c Fix the move_validity_map for transfers
65b7273 Actually update the reward_weight for eval
Stopped ppo-LuxAI_S2-v0-A10-100M-S1-2023-05-05T01:26:41.781321 because it's just https://github.com/sgoodfriend/rl-algo-impls/commit/1f4b6994e13946b609d7808ecfb5dcf0dc970b7c (already running) with the sync rollout refactor.