How Backgammon Led To Race For The Galaxy's Incredibly Clever AI

There's a ton of work needed to make any game successful, but if you're digitising a board or card game, the hardest part to get right is the AI. It's the field that can take the most time and money, but it's utterly essential if any developer wants players to keep coming back.

Thing is, developers working on those projects don't have millions of dollars at their disposal. And that was the problem facing Temple Gates Games, the studio responsible for making the PC, iOS and Android versions of Race for the Galaxy. So to give the engine builder the best chance of success, the developers looked at something a little more ancient: backgammon.

Theresa Duringer, the co-founder and a developer at Temple Gates Games, explained during a GCAP talk during Melbourne International Games Week how the company approached one of the most common problems for digital board games: bad AI. Board games have always struggled with providing a thorough challenge to players, often because of resourcing. Handwritten code is harder to maintain and manage, and it often misses the edge case scenarios and incremental chances of victory that advanced players — especially the hardcore fans of games like Patchwork or Agricola, for example, that are the most likely to immediately buy a digital version of the game — instantly find.

So Race for the Galaxy, an engine building card game first released physically in 2007, was built with a different plan in mind: machine learning.

As Duringer explained, the idea didn't come from the studio themselves. The person responsible was actually Keldon Jones, the creator of the fan-made web versions of Blue Moon and Race for the Galaxy. Jones had made a digitised version of RFTG before, and not only that, he'd incorporated his own machine learning model — back in 2009.

The neural network, Duringer said, was an adaptation of TD Gammon. TD Gammon was made by IBM researcher Gerald Tesauro back in 1992, and the exciting thing about the program was its temporal difference learning algorithm.

What made TD Gammon's model so immensely appealing for Race for the Galaxy both in Jones' application and from the standpoint of a mobile developer is that the neural network trains itself from scratch.

My 10 Favourite Games Of 2017: Alex Walker

Good hardware, good blockbusters, good indies, and good Aussie games. It's been one of the best years since the launch of the PS4 and Xbox One, and no matter where you look, there's been something to enjoy.

Read more

The basic implementation of a neural network is that it's fed images or data of something that a human verifies as "correct", which Duringer explained with photos of cats. Humans give the AI pictures of cats and say, "This is a cat," and the neural network learns from that and continually compares other images against those verified images to learn what is, and what isn't, a cat.

But TD Gammon, and Race for the Galaxy, doesn't need to do that. With no prepopulated data and no human "teacher" to verify the best moves against, the AI learns by playing itself. The AI network receives inputs and gets a reward signal based on feedback, generated by whether the game was won or not. There's even a trick where the neural network can generate a prediction on whether the game will be won or not by a certain player, which can be fed back into the neural network rather than having to wait for the end result of the game to train the AI. The benefit of this, Duringer explained, was that the neural network could be trained on every single turn, which also meant the AI could make smarter decisions from the start of the game.

"If you get the final reward signal ... you can still propagate that information to previous turns so your AI can be driven to make smart decisions as early as turn 1," Duringer said.

In an old Board Game Geek post, Jones explained more of the specifics in his web version:

You can view what the neural net thinks of each player's chances to win with the "Debug AI" dialog box. It's the second of the two tables. Each row corresponds to computing the eval network from the point of view of that player (certain things are known only to a given player, such as cards in hand). The columns show the likelihood of winning.

Now, the most important decision is arguably which action to choose at the beginning of each turn. This is handled a bit differently. First, the AI predicts what each opponent is likely to do. This is done with a second neural net (the "role" network). This network is very similar to the eval network, except that it has an output for each possible action choice a player may make. The predictions from this network are visible from the Debug AI dialog as the first table.

Once we think we know what the opponents will do, we simulate choosing each action ourself and run through the entire turn, then check the eval network at the end to see how well that action worked (we weight these scores by the probabilities the opponents will act in this way). This is why, for example, the AI may call Consume-Trade with no goods available. If there's a high chance an opponent will call Settle, and we have a windfall world that we can afford to play, this is a reasonable play.

So, the AI relies entirely on emergent behaviour; there are no preset scripts. However, the learning by playing against identical copies of itself probably does cause some group-think like shortcomings.

The model means that the neural network only needs to know the rules of the game, which is great for a board game with fixed rules and fixed end states. It lowers the cost of development too — less time has to be spent verifying what the AI is learning — and it nullifies most of the bias inherent from the developers, who would naturally teach an AI what they think are the best strategies or ways of playing a game. Combating that bias also means the AI can learn to be better at the game than the developers, too.

"AI plays very differently than people," Duringer explained. "People tend to follow a colour, and that can be a strong strategy, but the AI doesn't have the same allegiance to a particular colour and it tends to be more effective at playing the game, and you can see people's play patterns following the AI a little bit more."

One downside of this model is that, in the early stages, the initial AI behaviour takes thousands of games to start learning. But after the first 1000 games or so, it began to develop context-sensitive strategies. Instead of taking the most high value victory point card at the start of the game, the neural network and its contextual awareness would begin to take cards that were more playable in the early stages that wouldn't destroy its economy. Duringer added that all of this took place without feeding player decisions back into the network.

She explained that having a super intelligent AI capable of challenging the best players was also a better scenario all-around. "It's a lot easier to make a hard AI into an easy one, instead of making a dumb AI smart," Duringer said. So because the neural network makes decisions based on a score output, the developers were able to create different difficulty levels by adding some noise to the decision-making process. In practice, that meant that a medium AI in Race For The Galaxy should beat its hard AI counterpart around 25 percent of the time.

But not every board game would benefit from this kind of model. Duringer explained that their adaptation of the groundwork laid by TD Gammon works best with games that have end after a certain number of turns. Something like Chess, where stalemates are possible, or games where the spatial relationship between objects or pieces plays a huge factor, means you run into scenarios where the neural network has no final "win" or "loss" signal to learn from.

Technically, this problem can happen in Race for the Galaxy as well — players or the AI can continue drawing cards rather than playing more planets, or depleting the victory points. Normal people wouldn't do it, but the AI would naturally try such a method at some point just through sheer repetition. So the developers introduced a fake rule whereby the game ends after 30 turns. As far as the neural network's considered, if the AI has just drawn cards all that time, the game ends and the AI stops favouring that strategy because it didn't win.

What Temple Gates has created is effective enough that they've begun exploring what other board games it could be adapted for. There's a ton of tweaking that has to be done along the way, particularly as the AI begins to metagame itself, reducing its effectiveness against the strategies that real humans deploy and validating Jones' original concern about AI "group-think".

But the work done on the neural network is helping immensely with the digital port of Roll for the Galaxy, a dice-based sequel that came out in 2014. And it's fascinating to ponder how many other small studios and games could be supercharged with the right kind of machine learning model. A few games outside of tabletop and board games have already dabbled with some forms of AI learning and avatars — Killer Instinct's Shadow Mode comes to mind — but we're really only just scratching the surface of what's possible.


The author's accommodation through Melbourne International Games Week and PAX Australia was provided courtesy of Airbnb for Work.


Comments

    I wonder what would happen if you built an AI like this that deliberately didn't understand the more complex rules, so it was a gun at playing the obvious game but it never, ever sees certain synergies coming.

Join the discussion!

Trending Stories Right Now