Learning From Your Enemy

MARL trainig rendering & curves report for various RL agents v.s. bots and RL agents v.s. RL agents, demonstrating the main theme of AlphaTank repo: "not all the actions that the agent make is sensable, but it works!" It represents something that the agent have learned, something that is not human intelligence.

Kaiwen Bian

Created on March 16|Last edited on March 20

Comment

﻿
Agent v.s. Bots Rendering PerformanceAgent learns to use different strategies against different bots depending on the case encountered
Smart: agent learsn to set "traps" for the smart bot agent (learned that it will definately come chase for the learning agaent).
Defensive: agent try to avoid bullet when facing defensive agents.
Aggresive: agent learns to circle around to hit the aggressive bot agent as well as backtrack to hit.
﻿
game_video
This run didn't log media for key "game_video", step 17806, index 0. Docs →
This run didn't log media for key "game_video", step 17906, index 0. Docs →
This run didn't log media for key "game_video", step 11468, index 0. Docs →
This run didn't log media for key "game_video", step 33070, index 0. Docs →
Step
Run set4
﻿
Team Players Rendering PerformanceIn a 2 Agent v.s. 2 Bots (2 smart bots) setting, the learning agent developed similar straetgies as before.
In a 2 Agent v.s 1 Bot (1 defensive bot) setting, the agent still failed to fight the defensive bot. 
In a 2 Agent v.s. 3 Bots (1 aggressive, 1 smart, and 1 defensive bot) setting, the learning agent develop different stratgies facing different opponents.
﻿
Run set5
﻿
Agents v.s. Bots Training Curves & ResetsAll losses (policy, value, and entropy) drops, reward still flunctuates, it is tightly related to the reward engineering. The reward we engineer would dictates the strategy that the agnet can learn.
﻿
Run set3
﻿
Cycle Learning Players Training Curves & Resets﻿
Run set
﻿
Team Players (2T, 4A) Training Curves & Resets﻿
Run set2
﻿
Team Players (2T, 2A, 2B) Training Curves & Resets﻿
Run set3
﻿
﻿

Add a comment