Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Anil Ananthaswamy is the author of Through Two Doors at Once, among other books. His next is on the mathematics of machine learning.
You can also search for this author in PubMed Google Scholar
You have full access to this article via your institution.
DeepNash has mastered an online version of the board game Stratego.Credit: Lost in the Midwest/Alamy
Another game long considered extremely difficult for artificial intelligence (AI) to master has fallen to machines. An AI called DeepNash, made by London-based company DeepMind, has matched expert humans at Stratego, a board game that requires long-term strategic thinking in the face of imperfect information.

Google AI beats top human players at strategy game StarCraft II
Google AI beats top human players at strategy game StarCraft II
The achievement, described in Science on 1 December1, comes hot on the heels of a study reporting an AI that can play Diplomacy2, in which players must negotiate as they cooperate and compete.
“The rate at which qualitatively different game features have been conquered — or mastered to new levels — by AI in recent years is quite remarkable,” says Michael Wellman at the University of Michigan in Ann Arbor, a computer scientist who studies strategic reasoning and game theory. “Stratego and Diplomacy are quite different from each other, and also possess challenging features notably different from games for which analogous milestones have been reached.”
Stratego has characteristics that make it much more complicated than chess, Go or poker, all of which have been mastered by AIs (the latter two games in 20153 and 20194). In Stratego, two players place 40 pieces each on a board, but cannot see what their opponent’s pieces are. The goal is to take turns moving pieces to eliminate those of the opponent and capture a flag. Stratego’s game tree — the graph of all possible ways in which the game could go — has 10535 states, compared with Go’s 10360. In terms of imperfect information at the start of a game, Stratego has 1066 possible private positions, which dwarfs the 106 such starting situations in two-player Texas hold’em poker.
“The sheer complexity of the number of possible outcomes in Stratego means algorithms that perform well on perfect-information games, and even those that work for poker, don’t work,” says Julien Perolat, a DeepMind researcher based in Paris.

Self-taught AI is best yet at strategy game Go
Self-taught AI is best yet at strategy game Go
So Perolat and colleagues developed DeepNash. The AI’s name is a nod to the US mathematician John Nash, whose work led to the term Nash equilibrium, a stable set of strategies that can be followed by all of a game’s players, such that no player benefits by changing strategy on their own. Games can have zero, one or many Nash equilibria.
DeepNash combines a reinforcement-learning algorithm with a deep neural network to find a Nash equilibrium. Reinforcement learning involves finding the best policy to dictate action for every state of a game. To learn an optimal policy, DeepNash has played 5.5 billion games against itself. If one side gets a reward, the other is penalized, and the parameters of the neural network — which represent the policy — are tweaked accordingly. Eventually, DeepNash converges on an approximate Nash equilibrium. Unlike previous game-playing AIs such as AlphaGo, DeepNash does not search through the game tree to optimize itself.
For two weeks in April, DeepNash competed with human Stratego players on online game platform Gravon. After 50 matches, DeepNash was ranked third among all Gravon Stratego players since 2002. “Our work shows that such a complex game as Stratego, involving imperfect information, does not require search techniques to solve it,” says team member Karl Tuyls, a DeepMind researcher based in Paris. “This is a really big step forward in AI.”
“The results are impressive,” agrees Noam Brown, a researcher at Meta AI, headquartered in New York City, and a member of the team that in 2019 reported the poker-playing AI Pluribus4.
Brown and his colleagues at Meta AI set their sights on a different challenge: building an AI that can play Diplomacy, a game with up to seven players, each representing a major power of pre-First World War Europe. The goal is to gain control of supply centres by moving units (fleets and armies). Importantly, the game requires private communication and active cooperation between players, unlike two-player games such as Go or Stratego.
“When you go beyond two-player zero-sum games, the idea of Nash equilibrium is no longer that useful for playing well with humans,” says Brown.

No limit: AI poker bot is first to beat professionals at multiplayer game
No limit: AI poker bot is first to beat professionals at multiplayer game
So, the team trained its AI — named Cicero — on data from 125,261 games of an online version of Diplomacy involving human players. Combining these with some self-play data, Cicero’s strategic reasoning module (SRM) learnt to predict, for a given state of the game and the accumulated messages, the probable policies of the other players. Using this prediction, the SRM chooses an optimal action and signals its ‘intent’ to Cicero’s dialogue module.
The dialogue module was built on a 2.7-billion-parameter language model pre-trained on text from the Internet and then fine-tuned using messages from Diplomacy games played by people. Given the intent from the SRM, the module generates a conversational message (for example, Cicero, representing England, might ask France: “Do you want to support my convoy to Belgium?”).
In a 22 November Science paper2, the team reported that in 40 online games, “Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game”.
Brown thinks that game-playing AIs that can interact with humans and account for suboptimal or even irrational human actions could pave the way for real-world applications. “If you’re making a self-driving car, you don’t want to assume that all the other drivers on the road are perfectly rational, and going to behave optimally,” he says. Cicero, he adds, is a big step in this direction. “We still have one foot in the game world, but now we have one foot in the real world as well.”
Wellman agrees, but says that more work is needed. “Many of these techniques are indeed relevant beyond recreational games” to real-world applications, he says. “Nevertheless, at some point, the leading AI research labs need to get beyond recreational settings, and figure out how to measure scientific progress on the squishier real-world ‘games’ that we actually care about.”
doi: https://doi.org/10.1038/d41586-022-04246-7
Clarification 05 December 2022: This article has been updated to clarify Noam Brown’s role in the team that developed Pluribus.
Perolat, J. et al. Science 378, 990–996 (2022).
Article Google Scholar
Bakhtin, A. et al. Science https://doi.org/10.1126/science.ade9097 (2022).
Article PubMed Google Scholar
Silver, D. et al. Nature 529, 484–498 (2016).
Article PubMed Google Scholar
Brown, N. & Sandholm, T. Science 365, 885–890 (2019).
Article PubMed Google Scholar
Download references
No limit: AI poker bot is first to beat professionals at multiplayer game
Self-taught AI is best yet at strategy game Go
Google AI beats top human players at strategy game StarCraft II
DeepMind’s AI helps untangle the mathematics of knots
What's next for AlphaFold and the AI protein-folding revolution
AI system not yet ready to help peer reviewers assess research quality
Nature Index
After AlphaFold: protein-folding contest seeks next big breakthrough
News
Are ChatGPT and AlphaCode going to replace programmers?
News
AI bot ChatGPT writes smart essays — should professors worry?
News Explainer
Is China open to adopting a culture of innovation?
Nature Index
Traversable wormhole dynamics on a quantum processor
Article
Universitat Bayreuth
95447 Bayreuth, Germany
German Cancer Research Center in the Helmholtz Association (DKFZ)
Heidelberg, Germany
Umeå University (UmU)
Umeå, Sweden
Universitat Bayreuth
95447 Bayreuth, Germany
You have full access to this article via your institution.
No limit: AI poker bot is first to beat professionals at multiplayer game
Self-taught AI is best yet at strategy game Go
Google AI beats top human players at strategy game StarCraft II
DeepMind’s AI helps untangle the mathematics of knots
What's next for AlphaFold and the AI protein-folding revolution
An essential round-up of science news, opinion and analysis, delivered to your inbox every weekday.
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
© 2022 Springer Nature Limited