Watch An AI Learn To Play Mario, Live

Watch An AI Learn To Play Mario, Live

On TikTok, between the “get ready with me” videos, life hacks, and memes, a few robots are working on a challenge that many of us have faced at some point in our lives: beating Super Mario World. Over the past week, users have been live streaming an AI’s attempts to learn to play Mario, and for one robot in particular, it’s going great. Its name is Rupert, and it just beat level 2.

The AI’s strategy will be familiar to anyone who remembers their first time wielding a Super Nintendo controller. Rupert runs, jumps, slams into enemies, falls off cliffs, and dies—over, and over, and over. Every time it dies, Rupert tries again. Usually, it makes almost the exact same moves that killed it in the last round. But if you watch long enough, you’ll notice Rupert is evolving and getting better. It’s learning.

“It’s a program that is made to simulate natural selection with neural networks,” said Join The PCMasterRace, the TikTok user responsible for Rupert, who asked not to use his real name. (PCMasterRace is the objectionable name of a subreddit about desktop computers.)

In other words, Rupert is a system of machine learning algorithms that gets better by watching its own mistakes. Rupert has a set objective: get to the other end of the level. It knows which buttons it can push and it can see what’s happening on the screen. (You can actually see what Rupert “sees” in the top left of the video below.) But unlike a human Mario operator, an AI can’t just make assumptions that it should avoid Koopas or try not to fall off a ledge. All Rupert has is positive and negative feedback. Essentially, Rupert tries things at random. It remembers what did and didn’t work, and its strategy improves over time.

Rupert is modeled after evolution in the sense that it works using “species” and “generations.” The AI tries a particular strategy for each species, which lasts about two to six runs. For every 50-100 species, the AI collates what it learned into a “generation.”

As the AI plays, it gets a “fitness” score. Fitness goes up based on how far Mario gets to the right and the faster he gets there. The generations with higher fitness are selected to be “bred” for future generations, meaning the AI builds on top of the behaviour and patterns that worked and starts fresh. That allows its decision-making to get more sophisticated and complex over time.

It’s slow going, but it works. It only took Rupert 57 generations to beat level one, prompting celebration in the comments as viewers cheered Rupert’s success.

Rupert, along with another TikTok-streaming AI Mario player affectionately named George, is running an open-source program called MarI/O. It was built by coder and live-streamer Seth Hendrickson, who goes by SethBling online. MarI/O isn’t new. Hendrickson released it years ago, but the robot’s machinations have a renewed significance in an era where the tech industry wants us to believe AI will soon take over the world.

MarI/O is far more simplistic than a system like ChatGPT, but it’s a window into how AI models work. These AI tools sort of throw spaghetti at the wall, and humans design systems to tell them whether this attempt was better or worse than the last one. As time goes on, the attempts get better. Now imagine that happening millions or billions of times. You can see a more detailed explainer in a one of Hendrickson’s videos:

With ChatGPT, it’s exponentially more complicated. MarI/O doesn’t have that many options: left, right, up, down, A, B, X, and Y. The English language, on the other hand, has hundreds of thousands of words, a countless number of ways to arrange those words, and a theoretically infinite number of ideas. MarI/O is so much simpler than ChatGPT—and the tech is fundamentally different—but if you get how MarI/O works, you can extrapolate that out for a useful understanding of chatbot technology.

Rupert, sadly, is just a little guy. It’s doing its best, but Rupert is going to have trouble when it gets farther in the game. MarI/O’s system only rewards itself based on how far Mario gets to the right of the screen, but on some levels in Super Mario world, you have to climb up to reach the goal, rather than go to the right.

“However, I am planning to modify it so that it can climb vertical structures better,” Join the PCMasterRace said.