Skip to main content

Learning From Your Enemy

MARL trainig rendering & curves report for various RL agents v.s. bots and RL agents v.s. RL agents, demonstrating the main theme of AlphaTank repo: "not all the actions that the agent make is sensable, but it works!" It represents something that the agent have learned, something that is not human intelligence.
Created on March 16|Last edited on March 20

Agent v.s. Bots Rendering Performance

Agent learns to use different strategies against different bots depending on the case encountered
  • Smart: agent learsn to set "traps" for the smart bot agent (learned that it will definately come chase for the learning agaent).
  • Defensive: agent try to avoid bullet when facing defensive agents.
  • Aggresive: agent learns to circle around to hit the aggressive bot agent as well as backtrack to hit.

game_video
This run didn't log media for key "game_video", step 17806, index 0. Docs →
This run didn't log media for key "game_video", step 17906, index 0. Docs →
This run didn't log media for key "game_video", step 11468, index 0. Docs →
This run didn't log media for key "game_video", step 33070, index 0. Docs →
Run set
4


Team Players Rendering Performance

  • In a 2 Agent v.s. 2 Bots (2 smart bots) setting, the learning agent developed similar straetgies as before.
  • In a 2 Agent v.s 1 Bot (1 defensive bot) setting, the agent still failed to fight the defensive bot.
  • In a 2 Agent v.s. 3 Bots (1 aggressive, 1 smart, and 1 defensive bot) setting, the learning agent develop different stratgies facing different opponents.

Run set
5


Agents v.s. Bots Training Curves & Resets

All losses (policy, value, and entropy) drops, reward still flunctuates, it is tightly related to the reward engineering. The reward we engineer would dictates the strategy that the agnet can learn.

Run set
3


Cycle Learning Players Training Curves & Resets


Run set


Team Players (2T, 4A) Training Curves & Resets


Run set
2


Team Players (2T, 2A, 2B) Training Curves & Resets


Run set
3