Skip to main content

Lux S2: Multiple Rewards and Factory Placement

Created on May 16|Last edited on June 4

10M20M30Mglobal_step02000400060008000100001200014000
10M20M30Mglobal_step0100002000030000400005000060000
Run set
10

36de4c1 Double metal/ore reward; halve water/ice reward
896bec5 Use finetuned UnitTypeTable
e0db5a1 Don’t import lux or microrts if not used
e71f542 Fix colab links and installation for colab
ea7be5b Point to latest microrts changes
ffdd885 Moved microrts-specific modules to microrts directory
8e9f276 Fix TrainStats repr function
c813dc2 Don’t record ScoreRewardFunction
ce37b7c Point to rebased MicroRTS code
5848177 Microrts: Added mayari (2021 winner) to eval
450ec2b Fixup
f32aef4 Record v_loss and val_clipped_frac by head
f960f4b Require matplotlib be latest
1d1195b Fix CriticHead setting hidden layer gains to 1.0
eb5e2f5 Squash advantage by weights before going through loss
267a387 Fix WinLoss draw when no units
b0f6ce8 Reward for factories, heavies, and lights being alive
37b3aff assert -> warn on score_reward != winloss
78553fc microrts winloss 0 can occur even if score_reward is non-zero
57f5923 Assert score_reward is same sign as winloss
Does not include "6c4f610 Switch WinLoss head to use no activation".
6c4f610 Switch WinLoss head to use no activation
78553fc microrts winloss 0 can occur even if score_reward is non-zero
57f5923 Assert score_reward is same sign as winloss
b1a4b68 Use ScoreReward as the third head in Microrts and Lux
282f633 Lux reward term based off of the difference of scores
9813dfd Record results info & score based on cost+hp
91d85dd MicroRTSGridModeSharedMemVecEnv isn’t supported
bfe9121 Don’t reward power generation. Reward robot building
cb58909 Give policy hp and resources as floats
8ec7c25 Microrts-selfplay-dc-phases-final adds all maps
ce58e3f Reduce n_epochs to 2 for double-cone microrts
6c5a3dd Replace metal_remaining with factories_to_place
4e033fe Reduce n_epochs from 4 to 2 for Lux to reduce kl-div
7ed67a8 Get working on A100
cceeda0 Upgrade tensorboard & upgrade java runtime
28227ab Support for different size maps through padding
2f3729f Copy over vec_env from gym_microrts
d44cf1c Specify wheel url for gym-microrts
4003f90 Point gym-microrts to sgoodfriend fork
5c0a469 Support for multiple map_paths (limit to save size)
3f97901 A10 variant of Microrts with double-cone
4a0ce30 Microrts with double-cone and HyperparamTransitions
5ca3755 LuxHyperparamTransitions -> HyperparamTransitions
c4e5e04 lux_hyperparam_transitions_kwargs -> hyperparam_transitions_kwargs
ebe14de Swap every 3000 steps in Lux
2f578ea Fix other envs to use rollout_hyperparams
Mostly changes for Microrts with 3 small changes for Lux
c2c3b49 Use sync vec env for Lux
0402f24 Add metal remaining to assign as observation
431bf31 30 million step Lux
5bd3dbb gamma and gae_lambda no longer 1
c117dba Recharge on an empty queue is a waste of power
4faa853 max_move_repeats accounts for enqueuing power cost
e154efb Respect bid_std_dev setting
0855050 Support for factory placement through Dict Space
17f4706 Factor RolloutGenerator into top-level
c4f37eb selfPlayWrapper -> self_play_wrapper
6ea561b RolloutGenerator and SyncStepRolloutGenerator
f57a6ef Support ent_coef in LuxHyperparamTransitions
*0d1f0fd Add value_shape to ACN abstract class
*4cc095f Metric vf_coef
*d402515 [LuxS2] ent_coef by phases
*2e65826 Double cone network (#25)
51968fc Always bid 0
96d67d0 Fix cancel_action/move using the wrong unit
3017445 score_threshold to determine best in eval
* Also run in prior runs
  • Multiple reward output, multiple critic values, support for these multiple output in PPO
Is the double-cone-network branch rebased off of the reward changes