Lux S2: Multiple Rewards and Factory Placement
Created on May 16|Last edited on June 4
Comment
36de4c1 Double metal/ore reward; halve water/ice reward896bec5 Use finetuned UnitTypeTablee0db5a1 Don’t import lux or microrts if not usede71f542 Fix colab links and installation for colabea7be5b Point to latest microrts changesffdd885 Moved microrts-specific modules to microrts directory8e9f276 Fix TrainStats repr functionc813dc2 Don’t record ScoreRewardFunctionce37b7c Point to rebased MicroRTS code5848177 Microrts: Added mayari (2021 winner) to eval450ec2b Fixupf32aef4 Record v_loss and val_clipped_frac by headf960f4b Require matplotlib be latest
1d1195b Fix CriticHead setting hidden layer gains to 1.0eb5e2f5 Squash advantage by weights before going through loss267a387 Fix WinLoss draw when no units
b0f6ce8 Reward for factories, heavies, and lights being alive37b3aff assert -> warn on score_reward != winloss78553fc microrts winloss 0 can occur even if score_reward is non-zero57f5923 Assert score_reward is same sign as winloss
Does not include "6c4f610 Switch WinLoss head to use no activation".
6c4f610 Switch WinLoss head to use no activation78553fc microrts winloss 0 can occur even if score_reward is non-zero57f5923 Assert score_reward is same sign as winloss
b1a4b68 Use ScoreReward as the third head in Microrts and Lux282f633 Lux reward term based off of the difference of scores9813dfd Record results info & score based on cost+hp
91d85dd MicroRTSGridModeSharedMemVecEnv isn’t supportedbfe9121 Don’t reward power generation. Reward robot buildingcb58909 Give policy hp and resources as floats8ec7c25 Microrts-selfplay-dc-phases-final adds all mapsce58e3f Reduce n_epochs to 2 for double-cone microrts
6c5a3dd Replace metal_remaining with factories_to_place4e033fe Reduce n_epochs from 4 to 2 for Lux to reduce kl-div7ed67a8 Get working on A100cceeda0 Upgrade tensorboard & upgrade java runtime28227ab Support for different size maps through padding2f3729f Copy over vec_env from gym_micrortsd44cf1c Specify wheel url for gym-microrts4003f90 Point gym-microrts to sgoodfriend fork5c0a469 Support for multiple map_paths (limit to save size)3f97901 A10 variant of Microrts with double-cone4a0ce30 Microrts with double-cone and HyperparamTransitions5ca3755 LuxHyperparamTransitions -> HyperparamTransitionsc4e5e04 lux_hyperparam_transitions_kwargs -> hyperparam_transitions_kwargsebe14de Swap every 3000 steps in Lux2f578ea Fix other envs to use rollout_hyperparams
Mostly changes for Microrts with 3 small changes for Lux
c2c3b49 Use sync vec env for Lux0402f24 Add metal remaining to assign as observation431bf31 30 million step Lux5bd3dbb gamma and gae_lambda no longer 1c117dba Recharge on an empty queue is a waste of power4faa853 max_move_repeats accounts for enqueuing power coste154efb Respect bid_std_dev setting0855050 Support for factory placement through Dict Space17f4706 Factor RolloutGenerator into top-levelc4f37eb selfPlayWrapper -> self_play_wrapper6ea561b RolloutGenerator and SyncStepRolloutGeneratorf57a6ef Support ent_coef in LuxHyperparamTransitions*0d1f0fd Add value_shape to ACN abstract class*4cc095f Metric vf_coef*d402515 [LuxS2] ent_coef by phases*2e65826 Double cone network (#25)51968fc Always bid 096d67d0 Fix cancel_action/move using the wrong unit3017445 score_threshold to determine best in eval* Also run in prior runs
- Multiple reward output, multiple critic values, support for these multiple output in PPO
Is the double-cone-network branch rebased off of the reward changes
Add a comment