Gridnet2

Created on January 27|Last edited on January 28
Comment
﻿
﻿
Shaped Return
Shaped Return
50M100M150M200M250Mglobal_step0100200300400500
MicroRTSGridModeSharedMemVecEnv__ppo_gridnet__1__1643056730
MicroRTSGridModeVecEnv__ppo_gridnet__1__1642636142
losses/value_loss
losses/value_loss
50M100M150M200M250Mglobal_step0.20.40.60.811.2
MicroRTSGridModeSharedMemVecEnv__ppo_gridnet__1__1643056730
MicroRTSGridModeVecEnv__ppo_gridnet__1__1642636142
Sparse Return
Sparse Return
50M100M150M200M250Mglobal_step-1-0.500.51
MicroRTSGridModeSharedMemVecEnv__ppo_gridnet__1__1643056730
MicroRTSGridModeVecEnv__ppo_gridnet__1__1642636142
Sparse Return
Sparse Return
50k100k150k200k250k300kStep10152025303540
MicroRTSGridModeSharedMemVecEnv__ppo_gridnet__1__1643056730
MicroRTSGridModeVecEnv__ppo_gridnet__1__1642636142
 
Run set 24
Run set 21
Run set 31
﻿
Bonus Section 2: Selfplay
We have also tried some selfplay experiments, which is a crucial components in recent work such as AlphaStar (Vinyals et al. 2019). If the agents issue actions via Gridnet, selfplay can be implemented naturally with the parallel environments of PPO. That is, assume there are 2 parallel environments, we can spawn 1 game under the hood and use return player 1 and 2's observation for the first and second parallel environments, respectively and take the player actions respectively. 
However, note that the agents in the selfplay experiments are learning to handle both starting locations of the map, which is a different setting. For a fair comparison with other experiments in the main text, the other experiments would also need to be configured to learn with randomized starting locations. Nevertheless, it is fun to see the RL agent fight against itself:
﻿
Run set4
﻿
﻿
Add a comment