On Wednesday, Destiny 2 players began to see a warning message. A red line stretched across the bottom of their screens, and the text below it stated: “Attention: contacting Destiny 2 servers.” Minutes later, huge numbers of players got booted from the massively-multiplayer online sci-fi shooter.
At Kotaku, we lost connection at 4:11 pm ET (6:11am Thursday AEST). My character, who had been freshly boosted to power level 750 as part of the day’s grand game-wide overhaul and expansion, disappeared (she returned later).
On Twitch, nearly all the top streamers’ screens went grey, sporting a message that they were being kicked into a queue to get back in.
About an hour later, the official support account for the game, @bungiehelp, tweeted that the game was being taken offline: “Destiny 1 and Destiny 2 are being taken offline for emergency maintenance. Please standby for updates.”
Both Destiny games would remain down for several more hours. Some players fumed on social media and forums. Some said they would patiently wait. Still others were fuming in character, so there was at least something to laugh about: “I quit my job, I left my wife and I didn’t pick up my kids from Soccer Practice just so I can fulfil my Guardian duties. This is unacceptable and I am greatly disappointed.”
At 8:54 pm ET (10:54am AEST), Bungie Help stated that maintenance was complete. At that point, players were finally able to get back in. The biggest game release of the day had just been offline for about five hours.
To some people, this was a huge surprise. Bungie, an established studio, has been making and expanding Destiny for years, had delayed the launch of its newest expansion, Shadowkeep, for two weeks, and was now, on October 2, surely ready to launch.
To others, this was to be expected, a normal occurrence in gaming, where the launch of so many big online games has involved day one server problems.
But why, despite all the smart people out there making video games, do launches so often go awry?
Bungie has not yet said exactly what went wrong with this launch, though a rep for the company told Kotaku that some insight may be coming as soon as this today in the company’s weekly blog post.
Nevertheless, several game developers at other studios shared with Kotaku their understanding of why this kind of thing happens. Some have worked on big online launches, others have heard about the travails from peers.
Game developer and ardent Destiny player Rami Ismail provided some context by way of a metaphor as extended as a transcontinental railroad. According to him, the problems could be a lack of servers, which would mean a lack of computers needed to handle all connections players are attempting to make to the game. “Imagine a train station where there’s not enough tracks, but too many trains,” he said.
Or, he theorised, the issue might be on the client side — that is, the player’s end, from their console or PC, maybe due to a bug — that might be sending back too much data for the servers to process: “It’d be like a station where there’s enough tracks, but a lot of unexpected trains coming in.”
Ismail also speculated that it could be a problem with the code in the servers slowing down how the servers are processing data from the players’ machines: “It’d be like a station where there’s enough trains and tracks, but every passenger has to get out of the train, and fill out a form or something.”
Or it could be some third-party code or server that can’t handle the influx of players, but which the developers don’t have as much control over: “So now there’s a station across the border that’s not sending enough trains, and people are getting stuck at the station.”
Any of these problems could cause a failure in the overall server network for the game, he said. “That’s a common issue, so it’d offload it to nearby cores or servers. So now this one station is sending trains or passengers to other stations that were already working at capacity. Now those are affected too. The effect ripples through every station available. Some of them might shut down, or not be able to accept more trains.”
All of these situations are hard to plan for, Ismail said. Testing is possible, but nothing beats the real thing. “When the trains start moving for real is usually the first time you can see how everything goes. Maybe the train from station A to station D is actually 2% slower than you thought. Now train station D is slowly filling up, until eventually that failure occurs.”
All of the developers who spoke to Kotaku were sceptical that rough launches are due to a lack of available servers to handle the increase in players that comes with a new release. There is no scarcity of servers in the marketplace, they said, with top providers Amazon, Google and Microsoft making more available on demand fairly easily.
Top game publishers have the money to scale up when needed, though Bungie itself is notably a self-publishing indie studio for this week’s Destiny release. The studio and the franchise split with the powerful publisher Activision earlier this year.
“When a large game has a bad launch it’s usually not due to lack of machines,” said developer Josh Ling. “They have the machines, but the software running on those machines is causing issues: crashes, system failure etc.”
Drew Thaler, a developer who worked for the large studio Naughty Dog for four years, helped maintain or launch multiplayer servers for Uncharted and The Last Of Us games. He now runs an online services company called Mesh. When the PS3’s The Last Of Us was remastered for the PS4, the studio ran into problems even though, in his words, “the game was almost literally identical.” Unfortunately, he said, the problems stemmed from a piece of tech provided by Sony that hadn’t yet been used at a large scale.
“What it looked like to the end user was long matchmaking times,” he said. “The game code would form a match, and then some of the people wouldn’t be able to directly connect to each other, and the match would be forced to dissolve.”
The team came up with a workaround, only to discover months later that a minor piece of code — "the SSL stack in a Java implementation" — was causing the problem. It was an incredibly obscure and low-level problem to be having such a large impact on overall matchmaking, but it did. “It’s like Google Maps on your phone giving you bad directions... and tracing the reason back to faulty electrical wiring at your house.”
Then Uncharted 4 launched with much more ambitious multiplayer than Naughty Dog had done before: more players, computer-controlled side-kicks, virtual currency, and more, all of which involved new code. “That’s especially where you run into the problems,” he said. “It’ll work great in tests, and even in simulations with a few thousand people hitting it, but that first hour, when 100,000 people all log on in the same minute?
“Network services aren’t really just a server. They’re complex systems,” he said. “This server handles matchmaking, this one handles queuing, this one is receiving logs, this one is doing the NAT tunnelling, this one is doing location queries for better matching, this one is handling leaderboards, this one is handling authentication, this one is virtual currency and needs to be especially bulletproof against losing transactions and credits…” Failure can happen at any of those points.
Most games aren’t remasters. They’re not the relatively simple case of that Last Of Us port to PS4 and are more like the Uncharted 4 launch. Developers are doing something new, constantly coming up with new code, then seeing what happens.
At times, a developer’s attempt to offer a cool new feature can be catastrophic to the game’s online performance. Late last year, veteran game designer Raph Koster shared a story about the pioneering massively multiplayer game Ultima Online and the noble intention to give the game’s community virtual Christmas trees: “We didn’t have art for them, so I attached a script to a generic pine that spawned the little coloured gemstones we did have and vertically offset them,” he Tweeted.
“Then they each had a script with periodic callbacks to appear and vanish, so they twinkled. Then we gave one Xmas tree to every player who logged in. Thousands of trees each with twenty callbacks on approx one second intervals. The message queue overloaded on every server and crashed the entire service. On Christmas. I had the week off.”
Far more recently, Ubisoft launched The Division 2, right on the heels of a widely played beta for the game. Just a few days into the full game being available, players found that their turrets, drones and other equippable skills were self-destructing as soon as they had been activated.
The developers fixed the problem quickly, after discovering that the game’s servers were getting confused by an accumulation of status effects that had been applied to those skills and that eventually stacked up once lots of players were using the game. “It’s an issue that’s been there probably from the beginning,” one of the game’s developers later explained on a livestream, “but had just been building up.” The code was fixed and the problem swiftly went away.
If there are problems that emerge with scale, they show up most painfully when a game is suddenly free. On Wednesday, Bungie didn’t just release a new expansion for its big-budget AAA game Destiny 2. The studio also made most of the released content that preceded it free to play, most assuredly inviting a crush of new players to further stress the game’s online infrastructure.
“AAA F2P is also a massive multiplier on your userbase that you might not fully expect if you’re coming from a non-F2P world,” said Thaler, the ex-Naughty Dog developer. “We had a F2P weekend for Uncharted (in our beta) and it crushed our servers with 10-20x normal load.”
Destiny 2 has worked well for many players since its rough launch day. It’s not been great for everyone, though. Hours after the initial outage, players were still complaining of an error codenamed Weasel.
Thursday, Bungie advised players who were getting Weasel or an error called Squirrel to remove Cyrillic characters from their clan name or Steam username, respectively, if possible. (It’s also advised people to try to not have more than 300 friends on Steam, at least for the moment.)
Developers we spoke to did not forecast an end to this kind of thing but see ways it may be minimised in the future. The increase of online elements in more and more games, while annoying to some players, is giving more developers experience with making viable online infrastructures and troubleshooting problems quickly.
Third-party operations like U.K.-based Multiplay, which helped handle the massive but pretty smooth of EA and Respawn’s free-to-play battle royale game Apex Legends, also may play an increasingly useful role as games and the people who make them continue to be more ambitious. Even the seemingly omnipresent pre-launch betas, which are easy to dismiss as marketing, actually can help see if the game’s online infrastructure is up to snuff. None of this will eliminate problems, but can help.
Game makers still need to act swiftly to fix things, and relatively speaking, Bungie seems to have done so, turning a collapsing game back into a functioning one in the span of an afternoon. That’s not easy. “You have to do it under enormous time pressure and get it done yesterday,” Thaler said, “because everybody wants to play their favourite game and they can’t do that until you solve it.”