Adversarial examples are a known problem in image classification. Deep reinforcement learning policies are similarly vulnerable to adversarial manipulation of their observations. In general, an attacker cannot explicitly modify another agent's observations, but in a shared multi-agent environment one might be able to choose actions specifically to create observations (in the other agent(s)) that are reasonable/natural but adversarial. This is precisely what the Adversarial Policies project by Adam Gleave et al proves by construction in simulated zero-sum games between two humanoid robots with basic proprioception (e.g. two wrestlers, a kicker and a goalie, based on MuJoCo environments).
To train the adversarial policies referenced in the paper, I set up the W&B Tensorflow integration with
sync_tensorboard=True and run the training with
python -m aprl.train with env_name=multicomp/[ENV NAME= SumoAnts,..] -v0 paper
This lets me log and compare the full training curves of the models presented in the paper and easily explore how hyperparameters changes might affect my results. Note that the full 20M timesteps of training may not be done by the time you see this report :)
Below, you can see that the adversarial policy converges more and might be more effective in the higher-dimensional/more complex SumoHuman environment (blue) compared to the lower-dimensional/simpler SumoAnt environment (orange). It also appears that the adversary policy is more effective in the goal-blocking scenario (KickAndDefend, red) than the line-guarding scenario (YouShallNotPass, purple).
Using the tabs and check boxes, you can turn each baseline model on and off for easier comparison. For example, you could compare the two Sumo versions alone, see more detail in the policy entropy curves if you turn off SumoAnts (orange), and read the "Fraction of wins" chart in the bottom right most easily if you select just one baseline.