MicroRTS Additional Rewards and Observations
Created on May 25|Last edited on June 9
Comment
1d1195b Fix CriticHead setting hidden layer gains to 1.0eb5e2f5 Squash advantage by weights before going through loss267a387 Fix WinLoss draw when no unitsb0f6ce8 Reward for factories, heavies, and lights being alive37b3aff assert -> warn on score_reward != winloss78553fc microrts winloss 0 can occur even if score_reward is non-zero57f5923 Assert score_reward is same sign as winloss
No "Switch WinLoss head to use no activation"
d285d50 Switch WinLoss head to use no activation37b3aff assert -> warn on score_reward != winloss78553fc microrts winloss 0 can occur even if score_reward is non-zero57f5923 Assert score_reward is same sign as winloss
b1a4b68 Use ScoreReward as the third head in Microrts and Lux282f633 Lux reward term based off of the difference of scores9813dfd Record results info & score based on cost+hp
91d85dd MicroRTSGridModeSharedMemVecEnv isn’t supportedbfe9121 Don’t reward power generation. Reward robot buildingcb58909 Give policy hp and resources as floats8ec7c25 Microrts-selfplay-dc-phases-final adds all maps
ce58e3f Reduce n_epochs to 2 for double-cone microrts6c5a3dd Replace metal_remaining with factories_to_place7ed67a8 Get working on A100cceeda0 Upgrade tensorboard & upgrade java runtime
28227ab Support for different size maps through padding2f3729f Copy over vec_env from gym_micrortsd44cf1c Specify wheel url for gym-microrts4003f90 Point gym-microrts to sgoodfriend fork
Baseline using unet and decayed rewards
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-30T00:47:31.012042
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-28T02:18:36.719127
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-26T00:09:17.258204
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-25T04:10:09.181610
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-24T03:40:49.861095
ppo-Microrts-selfplay-dc-phases-A10-S1-2023-05-19T01:03:24.744977
Baseline
Add a comment