DeepNash: DeepMind's Stratego-Playing AI Works With Imperfect Information, Wins

DeepMind's DeepNash AI model plays Stratego, a game of imperfect information and long-time AI ability target, like an expert. Open-source algorithm R-NaD leads this new advancement.

Teli Davies

Created on December 1|Last edited on December 2

Comment

Following up on Meta AI's announcement of CICERO last week, DeepMind has today put a big focus their own recent game-playing model called DeepNash. Where Meta's model was built to demonstrate intelligent communication and strategy amongst players, DeepMind's Stratego-playing AI has it's sights set on operating in systems full of imperfect information.
﻿
Why Stratego matters for AIAI models like AlphaZero excel in what's called "perfect information games" - games which have all board information available to all players. In these games, game states can be analyzed to determine an objective best play. Chess, though having an immense number of board states, can be objectively analyzed at every turn.
On the other hand, we have games like poker, where each player is not aware of what their opponents' hands contain, allowing players the opportunity to try and bluff their way through bad hands, ruining any sort of attempt at objective game state analysis. This is an imperfect information game, and models like AlphaZero cannot be easily applied to them.
Stratego is one such imperfect information game. The issue that makes it such a hurdle to pass for machine learning is that it is also incomprehensibly complex in terms of game tree complexity. Game states cannot be objectively analyzed, and trying to model what the game looks like from the opponent's perspective is not reasonable because of the number of pieces in play and the length of the game.
﻿
The mastery of such imperfect information situations could apply in many real-world scenarios where information is not known about other parties.
R-NaD: DeepMind's solutionTo tackle the unique hurdle Stratego presents, and the broader issue of solving difficult scenarios of imperfect information, DeepMind came up with Regularized Nash Dynamics (R-NaD), the core algorithmic idea that DeepNash is built on.
Nash refers to a Nash Equilibrium, the idea of two (or more) players using some strategy in a game where their own strategy is the optimal strategy against their opponents' strategies. This puts the game into a strategic equilibrium, even if individual strategies are different, wherein a person world would result in a perfectly equal win percentage. This also leaves strategies in equilibrium inherently unexploitable.
DeepNash was trained in such a way to hit that equilibrium, and in the process learned to maintain a varied playstyle (eg. switching up starting composition, randomizing choice between equivelently evaluated moves) which was impossible to exploit.
R-NaD is open source, check this GitHub page for more information. See the end of this post for links to DeepNash/R-NaD's paper.
DeepNash excels at Stratego, even against humansDeepNash learned to evaluate beyond material advantage (eg. trading units for information) and also seemed to have mastered the art of bluffing (eg. chasing known high-value pieces with hidden low-value pieces), both core parts of Stratego and notably human-like behavior.
DeepNash was tested against other Stratego-playing bots as well as humans. Against state-of-the-art Stratego-playing bots, DeepNash hit an overall 97% win rate. Against top human players on the website Gravon, DeepNash achieved a top-3 position on the annual and all-time leaderboards with an 84% win rate.
Take a watch of one of DeepNash's games here (also see games 2, 3, and 4):
﻿
Find out moreRead the DeepMind blog post on DeepNash published today here.
DeepNash's original paper was first published on ArXiv in June, but has been re-published today in Science (with a paywall).
R-NaD is open-source, head to this GitHub page for more info.
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.