Learning From Your Enemy
MARL trainig rendering & curves report for various RL agents v.s. bots and RL agents v.s. RL agents, demonstrating the main theme of AlphaTank repo: "not all the actions that the agent make is sensable, but it works!"
It represents something that the agent have learned, something that is not human intelligence.
Created on March 16|Last edited on March 20
Comment
Agent v.s. Bots Rendering Performance
Agent learns to use different strategies against different bots depending on the case encountered
- Smart: agent learsn to set "traps" for the smart bot agent (learned that it will definately come chase for the learning agaent).
- Defensive: agent try to avoid bullet when facing defensive agents.
- Aggresive: agent learns to circle around to hit the aggressive bot agent as well as backtrack to hit.
game_video
Run set
4
Team Players Rendering Performance
- In a 2 Agent v.s. 2 Bots (2 smart bots) setting, the learning agent developed similar straetgies as before.
- In a 2 Agent v.s 1 Bot (1 defensive bot) setting, the agent still failed to fight the defensive bot.
- In a 2 Agent v.s. 3 Bots (1 aggressive, 1 smart, and 1 defensive bot) setting, the learning agent develop different stratgies facing different opponents.
Run set
5
Agents v.s. Bots Training Curves & Resets
All losses (policy, value, and entropy) drops, reward still flunctuates, it is tightly related to the reward engineering. The reward we engineer would dictates the strategy that the agnet can learn.
Run set
3
Cycle Learning Players Training Curves & Resets
Run set
Team Players (2T, 4A) Training Curves & Resets
Run set
2
Team Players (2T, 2A, 2B) Training Curves & Resets
Run set
3
Add a comment