Gridnet

Created on October 5|Last edited on January 21

Comment

﻿
﻿













Internal Code NameFull Experiment Name
ppo_gridnet_diverse_encode_decodePPO + invalid action masking + diverse opponents + encoder-decoder for Gridnet
Shaped Return
Shaped Return
50M100M150M200M250M300Mglobal_step050100150200250300
exp_name: ppo_gridnet_diverse_encode_decode, seed: 4
exp_name: ppo_gridnet_diverse_encode_decode, seed: 3
exp_name: ppo_gridnet_diverse_encode_decode, seed: 2
exp_name: ppo_gridnet_diverse_encode_decode, seed: 1
Sparse Return
Sparse Return
50M100M150M200M250M300Mglobal_step-1-0.500.51
exp_name: ppo_gridnet_diverse_encode_decode, seed: 4
exp_name: ppo_gridnet_diverse_encode_decode, seed: 3
exp_name: ppo_gridnet_diverse_encode_decode, seed: 2
exp_name: ppo_gridnet_diverse_encode_decode, seed: 1
RL agent against randomAI
This run didn't log media for key "RL agent against randomAI", step 36, index 0. Docs →
Step
RL agent against workerRushAI
This run didn't log media for key "RL agent against workerRushAI", step 71, index 0. Docs →
Step
Match results
ppo_gridnet_diverse_encode_decode__1__1616005043
Run set1
Run set 24
﻿
Bonus Section 2: Selfplay
We have also tried some selfplay experiments, which is a crucial components in recent work such as AlphaStar (Vinyals et al. 2019). If the agents issue actions via Gridnet, selfplay can be implemented naturally with the parallel environments of PPO. That is, assume there are 2 parallel environments, we can spawn 1 game under the hood and use return player 1 and 2's observation for the first and second parallel environments, respectively and take the player actions respectively. 
However, note that the agents in the selfplay experiments are learning to handle both starting locations of the map, which is a different setting. For a fair comparison with other experiments in the main text, the other experiments would also need to be configured to learn with randomized starting locations. Nevertheless, it is fun to see the RL agent fight against itself:
﻿
Run set4
﻿
﻿

Internal Code Name	Full Experiment Name
`ppo_gridnet_diverse_encode_decode`	PPO + invalid action masking + diverse opponents + encoder-decoder for Gridnet

Add a comment