What Happens When AI Tries To Review A Video Game

What Happens When AI Tries To Review A Video Game
Image: The Good Life / Kotaku Australia

It’s a comment I’ve seen hundreds of times, or variations of throughout my time here at Kotaku: internet complaints about the quality of reviews. “A bot can do better than this,” some would cry. So let’s put that to the test.

I’ve run this test before, although last time I fed Kotaku Australia comments into the machine learning model. That was run using a free online version of the GPT-2 language model, although the more powerful GPT-3 model is available now if you’re willing to pay to access the API.

So I did that, specifically through a tool called Shortly. We got some fun responses last time the AI pretended to double as a commenter. What happens when we ask it to review video games?

To do this we’ll need a bit of a methodology. GPT-2 and GPT-3 models functionally work by spitting out paragraphs, using a bunch of text you provide as a reference point. The neural network scans that against other content in its model to try and work out the tone and direction. Because of how that all functionally works, I’ve found machine learning tends to work best with public knowledge, rather than poetic licence. The art of a review is illustrating an experience, unpicking what it’s trying to do, what it is, and hashing out the difference between those things. Everyone does their reviews differently, of course, which is why people end up reading multiple reviews and comparing notes.

So to make this work as best as it can, I’m going to feed chunks of different reviews that have a very specific goal. Some of them will be largely mechanical, while some of them will be exclusively talking about a moment or part of that particular experience. With luck, we’ll get some interesting results.

Cyberpunk 2077 (using a Kotaku Australia review)

Screenshot: CD Projekt Red / Kotaku AustraliaScreenshot: CD Projekt Red / Kotaku Australia

What I fed the model:

The quirks that remain after Cyberpunk‘s major patches are both minor and cripplingly major. I’ve had NPCs attempt to charge my position, only to quite literally fall in front of me as they fail to navigate a body. Loot will be littered across a level after a fight, but some of it will remain left behind, because the game refuses to let me pick it up. I’ve had the game some determine that I’ve committed a crime despite not touching a nearby civilian, causing police to immediately spawn behind me, like they’ve just emerged from a secret closet ala DOOM. Some have been more convenient, but immersion breaking, like mini bosses and higher level enemies frozen in position or stuck to the floor.

What the AI produced:

In one case, I stood inside a dead end as enemies fought from the next room. One was an armoured giant wielding a flamethrower, and the other a flying enemy capable of quickly dealing chip damage to my health. The flying enemy attempted to exit through a hole in the wall only to fail repeatedly, repeatedly getting caught by the geometry, completely ignoring his health bar as he was forced to stop and rest after each attempt. Despite this being something that could not feasibly happen in any world I live in, it’s also very much Cyberpunk 2077. In its current state, I wouldn’t have it any other way.

Cyberpunk 2077 shares a lot with its Cyberpunk 2020 counterpart. Aside from the name, the story takes place in a dystopian future where poor people live in slums and slums rise up into high-rises, with cybernetic augmentation being considered a luxury item. In this version, humanity has been divided into classes called “Meths” — those who have access to technology and those who don’t. The players will be able to choose between being a member of the Corporate Security Contractor (CSC) or the Freelancers, an outlaw group that operates outside of government control.

I mean, that’s not too bad. Meths sounds like a legitimate term a developer would use, although there’s the tell-tale sign of small details spinning out of control. People rising up to assume power, money, and control from nothing totally fits the vibe, too. Although if a human was writing it, they’d emphasise the people taking charge, not the actual physical slum taking over the skyscrapers. That’s a bit too Fight Club, I think.

Returnal (using a Kotaku review)

Screenshot: Housemaruq / Kotaku AustraliaScreenshot: Housemaruq / Kotaku Australia

What I fed the model:

You play Selene Vassos, an interstellar scout working for a space agency called Astra. Returnal opens with Selene approaching Atropos, an uncharted planet. Atropos looms large and ominous, its pearl-white weather systems swirling over a planet too massive to fit entirely in view. In short order, a storm disables her spaceship, the Helios, sending it careening through the cloud cover, where it crash-lands in a damp Atropian forest.

Selene finds her sidearm on a deceased Astra scout and quickly notices that the corpse shares her callsign and spacesuit. Huh. Moving on. Astra protocol dictates that Selene can only be rescued if she reaches a far-off radio signal, so her one and only priority is figuring out how to get there.

What the AI produced:

But this is easier said than done. The game is set in the year 2048, but Returnal doesn’t feel like science fiction. It uses advanced technology to emphasise the harshness of its world, rather than to create anything particularly exciting or new. You won’t find any cool weapons or gadgets here; Selene has only the tools she starts with (her sidearm and a flashlight). The game takes place in darkness (and occasionally rain), which makes it difficult to see much further than your flashlight’s narrow cone of light.

Returnal’s minimal story is told mainly through logs scattered around the forest, using an awkward combination of short comic panels and character portraits.

I don’t really know where the model got the year, or the idea of a sidearm and a flashlight, but at least it’s on point about the darkness.

For the first couple of levels, anyway.

The Good Life (using a review from Rock, Paper, Shotgun)

Image: White Owls / SteamImage: White Owls / Steam

By the way, if you have the chance, read the review in full. It’s fantastic.

But for this one, I wanted to feed the model a slightly larger chunk of text to see if it would produce something more tonally accurate. At this stage, it doesn’t seem like the machine learning model pulls from the sites the reviews are drawn from, so I wondered if a little more context would help.

What I fed the model:

You play Naomi Hayward, a reporter sent from New York to work off her debts while digging up dirt on the postcard-perfect town of Rainy Woods. Your first scoop is handed to you on night one. During each new moon the residents transform into friendly cats and dogs, who wander about the streets eating out of bins and wearing tiny hats and little waistcoats and such. You can pet them if you like, and eventually, after drinking a potion offered to you by a stranger, you can turn into a cat or a dog yourself.

Somehow, that little mystery is not what The Good Life is all about. The village being entirely populated by a cast of shapeshifting night-creeps doesn’t come up very often, partly because of how quickly a whole other mystery comes tumbling out of the mystery chute. There’s been a ritual murder of a beloved local, and it’s up to you–and certainly not the police or anyone who knows the victim–to solve the crime.

From this point onward you’re free to explore Rainy Woods at your leisure, to chat with locals, transform into a cat to climb a handful of specific walls, and take on various quests about town. Days pass into night, characters follow routines, you have a garden where you can grow your own food and a shop where you can buy clothes. As a photographer you earn cash by uploading pictures to social media, earning bonus points for snapping particular objects around town.

What the AI produced:

It’s never made clear why there are so many cats and dogs in Rainy Woods, given that they’re mostly just the pets of the human townsfolk. But it’s a small and unlikely question alongside the main mystery: who killed Tom Nook? The death is couched as a mystery, but what it really is–or what I found it to be–is an obstacle. As Naomi you’re compelled to write about the town and its citizens, and it falls to you to solve mysteries such as this one. Your only way of interacting with the townsfolk is through photographing them.

OK. If there was a world where Nintendo is somehow convinced to let Swery make a video game where Tom Nook is killed, the internet would explode.

Sable (using a Kotaku U.S. review)

Screenshot: Raw Fury / Kotaku AustraliaScreenshot: Raw Fury / Kotaku Australia

What I fed the bot:

Sable, developed by Shadeworks and published by Raw Fury, is a special game. The press and indie scene alike have treated it as such for a long time, and it’s no wonder why. Its visuals are breathtaking. Its atmosphere is impeccably crafted. The game’s soundtrack is by Japanese Breakfast for God’s sake. And it is designed in explicit conversation with the best games of the last decade, most notably Breath of the Wild. It was all but chemically engineered to be an indie darling. I am glad that it is both more, and less, than many people’s expectations.

What the AI produced:

The premise: you play as a loveable alcoholic bum, frequently getting kicked out of hostels. As such, the game has a wry and affectionate tone, like you’re having a drunken argument with a friend, but this is only exacerbated by the fact that most of your dialogue is kept up in game text, as you wander around. This makes it especially enjoyable when you are forced to rely on your own ingenuity to proceed. I will say that while I liked this style of game dialogue in Pillars of Eternity, Breath of the Wild isn’t exactly the same.

As you travel around Sable, it becomes apparent that your clientele is of mixed moralities. Some are complicit in the system and the hostel that they stay at is more like an involuntary holding pen. Others are desperate and in love, but fearful of each other’s agency. When you look around the buildings, you see all kinds of characters, each with their own amusing quirks. There’s the perpetually stoned bartender, the blind worker, and the dominatrix. You will go on so many shitty, hilarious adventures in this game, it’s astounding.

Uh, what?

Metroid Dread (using a Digitally Downloaded review)

Image: NintendoImage: Nintendo

What I fed the bot:

What made me want to stop–and frequently–was the rest of it. Metroid Dread is organised in the non-linear way that the series is known for, but because it is so big, and so utterly in love with how clever it thinks its trail of breadcrumbs is, that I would spend huge chunks of time pouring over a very non-intuitive map trying to figure out just what I needed to do next. There was one particularly galling incident, which highlights just how obnoxious Metroid Dread can be (though it rarely goes this far): I had just cleared the first zone and got access to the second. However, in that zone the two doors I could access each led directly to excessive heat zones, which Samus couldn’t survive at that point. So, assuming that I needed to complete something in the (still not 100 per cent completed) first zone, I headed back to poke around and figure out what I had missed.

What the AI produced:

Still nothing, so I did it again. Then back to the second zone for a fourth fruitless exploration. By the time I had done this for a fifth time, which was when I started to get truly angry, it finally dawned on me that there was nothing left to do in the first zone.

This did not improve my mood at all. But what made it worse was something else.

I had started Metroid Dread in preparation for a Nintendo Switch review. I’d been playing it on the big screen, because that’s how I prefer to play Metroid games, but also because, well, that’s how they’re meant to played. The problem was that the Switch version of Metroid Dread has some glaring technical issues that just made the experience even more irritating than it would already be. For one thing – and this is a big thing–it is really hard to see where you’re going in certain places. This is not only in the heat zones (which dim and distort the graphics), but also in areas in which Samus can survive (such as when she meets up with fellow bounty hunter Sylux once again).

I mean, props for pulling Sylux completely out of nowhere. I don’t think anyone spent any time drawing connections between Metroid Prime Hunters and Metroid Dread, but I’m glad the AI did.

Cruelty Squad (using a PC Gamer review)

Image: Consumer Softproducts / SteamImage: Consumer Softproducts / Steam

What I fed the bot:

Cruelty Squad is Deus Ex if it were made today, the natural product of furious people exhausted by wealth inequality, police militarisation, and the stubborn structures that keep humanity rolling towards total annihilation of the soul. Yeehaw.

But Cruelty Squad wants to have fun before the inevitable end. It’s a stealth action game that props up the pure, lizard-brained joy of feeling like I outsmarted the designers through wild experimentation, even if I’m doing exactly what they expected. Like Hitman, it’s a cathartic exercise in taking out the absolute worst people alive. It’s an audiovisual marvel, a virtual world disintegrating in front of your eyes. And it’s one of the most brilliantly absurd games I’ve played in a minute, a vision of the future where people are considered corporate subsidiaries and the weapon market ebbs and flows based on what guns are licensed for use in popular anime.

What the AI produced:

Cruelty Squad is the brainchild of Sloclap, a studio whose last game was the fantastic medieval brawler For Honour. It’s a stealth action game with old-school sensibilities, and I mean that in the best way possible. It does not need to do anything else with its concept; this is a simple premise that’s executed to perfection. That’s why it’s such a surprise when I discover that there are layers and layers of complexity here if you give it time. It’s a game that rewards both patience and impulsiveness, and the only thing standing in the way of mastery is your own ability to learn from what doesn’t work.

Cruelty Squad is a ’90s-style stealth game that takes place in the near future. It looks like Blade Runner meets Ghost In The Shell, and it has an immaculately cohesive aesthetic. Every single level has its own unique colour scheme and graphical filter, something I’ve never seen done so successfully before. It’s gorgeous; if someone were to make a movie or music video about this world, everything would be dialed in exactly as it is here.

What’s interesting here is the machine learning model’s penchant for just evidently making shit up. Sloclap exists as a developer: they’re the makers of Sifu and Absolver and have precisely bugger all to do with Cruelty Squad. It’s interesting to me that the model would pull that detail out, then, when it’s something that’s so easily checkable.

It would be more on point if I fed it larger chunks of the original review, but I’m wary of upsetting the original writer by lifting too much of their work, even though this is for a completely different context and it’s all in good fun. Still, it’s a good indicator of what machine learning trained models really suffer at: not just contextually relevant detail, but details that are accurate.

Red Dead Redemption 2 (using a Kotaku review)

Image: Rockstar GamesImage: Rockstar Games

For this, I wanted to see what the AI could do if it was given a significant chunk of text. Would it be more accurate this time, or pull data from better sources? And what would it do with a chunk of text that was more human, less concerned with concrete mechanical details and more about the emotion they conveyed?

For this one, I also asked Shortly to set up the page as if I was writing a story “or any other creative piece,” as opposed to the option for writing blogs, articles and non-fiction work.

What I fed the bot:

It has long been an open secret in the games industry that Rockstar’s studios embrace a culture of extreme work, culturally enforced ‘voluntary’ overtime, and prolonged periods of crunch. The ‘secret’ part of that open secret evaporated somewhat over the past week, as a controversial comment by Rockstar co-founder and Red Dead Redemption 2 writer Dan Houser set off a cascade of revelations about work conditions at the notoriously secretive company.

Over the past month, my colleague Jason Schreier spoke with nearly 90 current and former Rockstar developers, and his report on the matter paints a picture of a vast and varied operation that, for all its talk of change, has clearly spent years embracing and profiting off of a culture of exorbitant overwork that even many who say they are proud to work at Rockstar want to see changed.

Play Red Dead Redemption 2 for just a few minutes, and the fruits of that labour will be immediately apparent. This wonderful, unusual game was clearly a titanic logistical undertaking. Every cutscene, every railroad bridge, every interior, every wandering non-player-character has been polished to a degree previously only seen in more limited, linear games.

If Naughty Dog’s relatively constrained Uncharted 4 required sustained, intense crunch to complete, what must it have taken to make a game a hundred times that size, but with the same level of detail? As critic Chris Dahlen once put it while ruminating on how much easily missable, painstakingly sculpted work is included in the average big-budget game, ‘That’s some fall of the Roman Empire stuff right there.’

I sometimes struggled to enjoy Red Dead Redemption 2’s most impressive elements because I knew how challenging — and damaging — some of them must have been to make.

Yet just as often, I found myself appreciating those things even more, knowing that so many talented people had poured their lives into crafting something this incredible.

What the AI produced:

Of course, some developers working on the game faced truly agonising conditions; some had to receive some kind of medical treatment while working on it (which Rockstar initially attempted to classify as ‘sick time,’ before changing course). Some told of working grotesquely long hours for no extra pay, while others had to deal with management ‘incentivising’ or ‘articulating’ their own productively overworking. One anonymous source told of arriving at work outside of his contract’s 12-hour day, only to receive another call telling him to come in right away, only to leave him there for another 10 hours.

When the crunch is over, some hope to go home. But hey, at least you got to play Red Dead Redem…oh, wait.

The game, to be clear, was ambitious, beautiful, and well worth it. But many who worked on Red Dead Redemption 2 undoubtedly traded years of their lives for the chance to make it. And after the game was released, it seemed that Rockstar intended to keep them at their desks crunching, rather than allow them to wrap up, relax, and focus on the game that they’d spent years working on.

Rockstar’s initial statement about Dan Houser’s quote — that it had been ‘confusing,’ yet he’d only meant to refer to the writing staff, not the whole company — seemed plainly contradicted by the full quote. Houser had said not only of ‘100-hour weeks,’ but ‘we were working 100-hour weeks,’ one of the most egregious of all business clichés.

Yet Rockstar did nothing until two days had passed, until the Wall Street Journal had written an article about its comments — which apparently got its hands on documents laying out its overtime policies.

This is probably a case where the AI produced the most readable content, even though there’s a clear lack of a check and balance — there’s no prompt or reason for the model to be manufacturing anonymous sources out of thin air. And the Wall Street Journal did write a story about Red Dead Redemption 2 but not the comments specifically — that piece was more about the sustainability of video game revenue and how investors would respond.

So out of all of this, it’s pretty clear to me: AI can’t review a video game. The elements where the machine learning was asked to riff on experience and emotion actually turned out better than expected, but at almost every opportunity it read like the model adopted the “fake it till you make it approach.” It’d introduce details that were wrong, sometimes even names and developers from different projects entirely.

But it’s interesting to see just how far that AI-generated content has come. I imagine we’re not far off the day where some outlets or news wires start dabbling with GPT-3 generation for press release material, simply because the sheer volume of content online outstrips the number of people available to write it (but not the potential readership).

When it comes to reviews at least, you can’t beat the human touch. People know best what elements matter to other people. AI will get there one day, but that day isn’t today.

This article originally appeared on Kotaku Australia.

Log in to comment on this story!