Computer scientists have developed a card-playing bot, called Pluribus, capable of defeating some of the world’s best players at six-person no-limit Texas hold’em poker, in what’s considered an important breakthrough in artificial intelligence.
Two years ago, a research team from Carnegie Mellon University developed a similar poker-playing system, called Libratus, which consistently defeated the world’s best players at one-on-one Heads-Up, No-Limit Texas Hold’em poker. The creators of Libratus, Tuomas Sandholm and Noam Brown, have now upped the stakes, unveiling a new system capable of playing six-player no-limit Texas hold’em poker, a wildly popular version of the game.
In a series of contests, Pluribus handedly defeated its professional human opponents, at a level the researchers described as “superhuman.” When pitted against professional human opponents with real money involved, Pluribus managed to collect winnings at an astounding rate of $US1000 ($1433) per hour. Details of this achievement were published today in Science.
Over the past several decades, AI researchers have had a lot of success developing machines capable of playing perfect-information, two-player, zero-sum games. That is, games involving head-to-head matches in which both players have complete knowledge of what’s happening in the game (e.g. chess players can see all the pieces on the board), and in which one player wins and the other loses.
By contrast, poker is an information-incomplete game, in which the players can’t be certain of which cards their opponents are holding and which ones are still in the deck. Other elements, like betting and bluffing, add to the game’s complexity and unpredictability. Add multiple players to the mix, and the complexity rises further still.
For AI researchers, poker presents a better model of the real world. Rarely in life do situations involve just one winner and one loser, or scenarios in which information is fully available. By improving an AI’s ability to deal with hidden information in multi-participant scenarios, computer scientists are dramatically expanding the domains in which AI can be used.
“While I am not focused on any particular application, I do think this research can be applied to a wide variety of settings such as cybersecurity, fraud detection, combating adversarial behaviour, and even having a self-driving car navigate traffic,” Brown told Gizmodo.
For the new study, Brown and Sandholm subjected Pluribus to two challenging tests. They first pitted Pluribus against 13 different professional players, all of whom have earned more than $US1 million ($1.3 million) in poker winnings, in the six-player version of the game. The second test involved matches featuring two poker legends, Darren Elia and Chris “Jesus” Ferguson, each of whom was pitted against five identical copies of Pluribus.
The matches with five humans and Pluribus involved 10,000 hands played over 12 days. To incentivise the human players, a total of $US50,000 ($71,662) was distributed among the participants, Pluribus included. The games were blind in that none of the human players were told who they were playing, though each player had a consistent alias used throughout the competition.
For the tests involving a lone human and five Pluribuses, each player was given $US2000 ($2866) for participating and a bonus $US2000 ($2866) for playing better than their human cohort. Elia and Ferguson both played 5000 separate hands against their machine opponents.
In all scenarios, Pluribus registered wins with “statistical significance,” and to a degree the researchers referred to as “superhuman.”
“We mean superhuman in the sense that it performs better than the best humans,” said Brown, who is completing his Ph.D. as a research scientist at Facebook AI. “The bot won by about five big blinds per hundred hands of poker (bb/100) when playing against five elite human professionals, which professionals consider to be a very high win rate. To beat elite professionals by that margin is considered a decisive win. It’s a bit tough to qualify this in a [simple] way… but one way to understand it is that if the bot were playing for real money, it would have won about $US1000 ($1433) per hour.”
And that’s against some of the world’s best poker players. Adorning Pluribus with superhuman status certainly seems justified, and Roman Yampolskiy, a computer scientist at the University of Louisville who wasn’t involved with the new work, agrees.
“The machine showed superhuman performance by defeating the best players in the world,” Yampolskiy told Gizmodo. “It obviously could defeat weaker players, meaning that it is superior to all humans making its performance unquestionably superhuman in this domain.”
For Yampolskiy, the achievement was significant because, “unlike chess or Go, the game of Poker has hidden information and the element of luck, meaning you can’t just outcompute humans, you have to outplay them,” he said. “Poker in particular has been an early sandbox for AI, and to show such a level of dominance in an unrestricted version of poker with many players has been a holy grail of research since the early days of AI.”
To create a system capable of proficiently playing six-player no-limit Texas hold’em poker, Brown and Sandholm employed a grab bag of strategies, including new algorithms the duo developed themselves.
Before the competition started, Pluribus developed its own “blueprint” strategy, which it did by playing poker with itself for eight straight days.
“Pluribus does not use any human gameplay data to form its strategy,” explained Brown. “Instead, Pluribus first uses self-play, in which it plays against itself over trillions of hands to formulate a basic strategy. It starts by playing completely randomly. As it plays more and more hands against itself, its strategy gradually improves as it learns which actions lead to winning more money. This is all done offline before ever playing against humans.”
Armed with its blueprint strategy, the competitions could begin. After the first bets were placed, Pluribus calculated several possible next moves for each opponent, in a manner similar to how machines play chess and Go. The difference here, however, is that Pluribus was not tasked to calculate the entire game, as that would be “computationally prohibitive,” as noted by the researchers.
“In Pluribus, we used a new way of doing search that doesn’t have to search all the way to the end of the game,” said Brown. “Instead, it can stop after a few moves. This makes the search algorithm much more scalable. In particular, it allows us to reach superhuman performance while only training for the equivalent of less than $US150 ($215) on a cloud computing service, and playing in real time on just two CPUs.”
Even with a limited look-ahead strategy, Pluribus was still able to dominate its human opponents.
Importantly, Pluribus was also programmed to be unpredictable – a fundamental aspect of good poker gamesmanship. If Pluribus consistently bet tons of money when it figured it had the best hand, for example, its opponents would eventually catch on. To remedy this, the system was programmed to play in a “balanced” manner, employing a set of strategies, like bluffing, that prevented Pluribus’ opponents from picking up on its tendencies and habits.
Some of the strategies used by Pluribus came as a surprise to the researchers, including an unorthodox strategy known as “donk betting,” which happens when a player matches the bet, but then starts the next round with a bet. Poker players consider donking a weak move with little strategic sense.
“The conventional wisdom is that if you are going to call [match the bet] and then bet [during the next round], then you might as well raise instead because it gives you more opportunities to get more money into the pot,” explained Brown. “Donk betting is something that weak players tend to do, though elite professionals acknowledge that it could, in theory, be a good action if done correctly in the right situations.
However, doing it correctly without opening up exploitable weaknesses is typically too complicated for humans, even elite human professionals, so most only rarely if ever do it. Pluribus has found ways to donk bet much more effectively in a way that cannot easily be exploited.”
Also, Pluribus often made much larger bets that human players typically avoid. Brown said this put Pluribus’ opponent into very difficult situations, which allowed the machine to make much more money with good hands than humans could.
Chris Ferguson, WSOP champion: Pluribus is a very hard opponent to play against. It’s really hard to pin him down on any kind of hand. He’s also very good at making thin value bets on the river. He’s very good at extracting value out of his good hands. So it’s been very hard playing against him. He’s really a very strong opponent.
Darren Elias: Its major strength is its ability to use mixed strategies.That’s the same thing that humans try to do. It’s a matter of execution for humans – to do this in a perfectly random way and to do so consistently. Most people just can’t. The bot wasn’t just playing against some middle of the road pros. It was playing some of the best players in the world.”
Jason Les: I probably have more experience battling against best-in-class poker AI systems than any other poker professional in the world. I know all the spots to look for weaknesses, all the tricks to try to take advantage of a computer’s shortcomings. In this competition, the AI played a sound, game-theory optimal strategy that you really only see from top human professionals and, despite my best efforts, I was not successful in finding a way to exploit it. I would not want to play in a game of poker where this AI poker bot was at the table
Jimmy Chou: Whenever playing the bot, I feel like I pick up something new to incorporate into my game. As humans I think we tend to oversimplify the game for ourselves, making strategies easier to adopt and remember. The bot doesn’t take any of these short cuts and has an immensely complicated/balanced game tree for every decision.
Sean Ruane: In a game that will, more often than not, reward you when you exhibit mental discipline, focus, and consistency, and certainly punish you when you lack any of the three, competing for hours on end against an AI bot that obviously doesn’t have to worry about these shortcomings is a gruelling task. The technicalities and deep intricacies of the AI bot’s poker ability was remarkable, but what I underestimated was its most transparent strength – its relentless consistency.
“Once again, AI managed to outperform humans without relying on any data from human play,” Yampolskiy told Gizmodo. “This means that machines can teach themselves to solve complex problems independently of human supervision.”
Yampolskiy wasn’t surprised by how well Pluribus performed, though he would have liked to have seen Pluribus play standard 10-player games, and without having to abide by betting restrictions (unlike its human opponents, Pluribus was not allowed to make bets above $US10,000 ($14,332)).
What does surprise Yampolskiy, however, is that there are still some games in which computers are not superhuman in terms of their performance. As to where this type of AI could be applied in the future, Yampolskiy said similar techniques could be used “to outperform humans in negotiations, trading, and game-like competitions such as war strategy.”
To which he added, perhaps ominously: “Essentially, any skill which could be represented as a game-like situation can be dominated by superhuman AI.”