Between the Lanes: Denying Denial of Service
Author: kyled,
published 8 months ago,
[img]https://clan.cloudflare.steamstatic.com/images//3703047/4d7209c869530f3e0f737e28a1156e7017314660.jpg[/img]
[i]Welcome back to Between the Lanes, a blog feature where members of our development team walk through some of the challenges, bugfixes, and occasional happy accidents we encounter while working on a game as unique as Dota.[/i]
This is a story about the internet, and how it doesn’t work like it should, when it works at all, except when it doesn’t. It’s a story about how the internet stopped working for our customers, and how we fixed it.
The internet is more of a wild frontier than we give it credit for being. Stray from the safe confines of your two-factor identification and trusted cookies, and it can be a bit of a wilderness out there, full of random trolls with the maliciousness—and, increasingly, the ability—to make your life pretty miserable for no other reason than because they can.
Back in 2014, the method those trolls were using was distributed denial-of-service (DDoS) attacks. “Distributed” refers to a large number of internet hosts maliciously flooding a particular target IP with traffic in an effort to overwhelm the network stack. This is called a volumetric attack, and the intent isn’t to try to get into the network. It’s just trying to deny service. A distributed denial of service means that legitimate people who want legitimate service are crowded out by the bad traffic.
The problem with DDoS attacks was that, by late 2014, they weren’t being committed by data-exfiltrating super-hackers with advanced computer science degrees. They were being committed by just about anybody who wanted to pay a service or a bot to do it for them. And it’s pretty obvious why. Although some people were happy to do it purely for vandalism’s sake, others had a motive: A DDoS attack was a surefire way to shut down a match that either you or someone you were rooting for were losing. This wasn’t just an occasional irritation anymore. It was turning into an outright assault on any game where players competed.
By the opening months of 2015, we were seeing a huge uptick in DDoS attacks on Dota and CS:GO, with other companies reporting a huge jump as well. Someone had, very suddenly, made it very easy for anyone to do this.
In August 2015, The International was disrupted with DDoS attacks. Although the pros playing the match weren’t affected, for more than two hours, the broadcasters couldn’t get into the matches to give play-by-play and color commentary. Sending out the stream as a TV broadcast became an issue. The players were suddenly playing in a void. This was a professional gaming event with millions watching and millions of dollars on the line, and it was being disrupted by random people with five-dollar software. It was a problem Valve couldn’t ignore.
[img]https://clan.cloudflare.steamstatic.com/images//3703047/c63292d66774594361f081935573b7567dbfe7b6.jpg[/img]
We tried several solutions to deal with DDoS attacks before we arrived at one that worked. Initially, we attempted to filter the traffic with a powerful network switch. Unfortunately, this type of filtering is inherently difficult to do with game traffic. It is the nature of game servers to receive unsolicited UDP (User Datagram Protocol) traffic from arbitrary IP addresses. Imagine you had a post office that weeded out unwanted junk mail for you. But now imagine your job is as an advice columnist, and you receive tons of legitimate mail from random strangers all the time. For you, the post office doesn't know what's junk mail and what isn't. That's how traffic to game servers tends to look. Furthermore, the source IP in UDP packets is not secured, and can be easily spoofed. Our post office cannot even look at the return address on the envelope for clues, because the senders of junk mail forge that.
Steam delivers a lot of bits for game content, and has built up a large network for doing so. We were already taking advantage of this network to deliver game traffic over dedicated links, obtaining good peering, ensuring that networking engineering best practices were used, etc. This kept player ping times low, but did not protect against DDoS attack. The problem is that UDP protocols are not secure, so while we had our own network, it wasn't private.
To prevent attackers from using own own network to attack our servers, we needed to control all the entrances and secure them. We accomplished this by creating proxies for game traffic, routing every single packet of data transmitted across the network through relays. Now when a client wanted to talk to a game server, it had to do so through a relay that both authenticated it and proxied that traffic to the game server. This meant the IP address of the server was always hidden—the attacker simply had no idea where to attack.
To re-use our antiquated post office metaphor from earlier, our spammer no longer had an address to send junk mail to. They could send it to every post office in the area and ask them to mail it, but without authorization, that post office isn’t going to. (Moreover, that post office would find it a little suspicious that someone was trying to send a single person 100,000 letters.)
[img]https://clan.cloudflare.steamstatic.com/images//3703047/d4e0ef987a9df51ac53309c6a1483cfcdd6749fd.jpg[/img]
But couldn’t you just attack the relay? Technically, you could. But we have an essentially limitless number of them, and we built them to be attacked. A “relay” is just another word for a computer running software. You can attack it or take it offline, but the protocol was designed with that assumption in mind. If a client is trying to play a game and loses contact with a relay, it just switches to another. Relays are like hundreds of pawns scattered around the world with the singular purpose of guarding the game server. (Incidentally, taking out a relay is harder than it sounds. They’re engineered pretty well and positioned in a specific part of the network, so although they were built to be taken offline, we haven’t lost one yet.)
The solution was straightforward but effective. Before, if someone wanted to disrupt a game, they could just overpower a single game server (a very low bar to clear). Now they had to overpower essentially the entire data center—a much, much, much higher bar. Are there attacks that could still accomplish this? Of course. Are there attacks that can do this that anyone online could buy for five dollars? No. An attack this sophisticated was officially out of the price range of most people.
With this new system up and running, we had an epiphany: If we controlled our own private network, we wouldn’t be beholden to how the normal internet works. We could use it to make the customer experience even better. With the normal internet, when you send a packet from one IP address to another, the route you use is determined by Border Gateway Protocol (BGP). This is a routing algorithm that decides how your packet will travel across a network, and you have no choice in the route it picks.
But with a virtual private network composed of hundreds of global relays and data centers, we could essentially choose our own route from the client to the game server—often a faster shortcut than the default route. If you’re using Steam Datagram Relay (SDR), the Steam overlay will show your ping time and what route we’re giving you, so you can see for yourself how it gets optimized.
[img]https://clan.cloudflare.steamstatic.com/images//3703047/82554298cf9ed002daf92613c81d19d6072ef09c.jpg[/img]
A feature that started as a way to protect Dota game servers has grown past what anyone could have expected. The SDR network routinely delivers as much as 140M packets and 550GBit per second. We have relays in 31 data centers with a capacity of over 5TBit. What we now call the Steam Datagram Relay not only protects against DDoS attacks, but also increases connectivity and lowers ping for every Dota customer. And it doesn’t just do this for Dota, but for any game on Steam that wants to take advantage of it.
We hope you enjoyed another peek between the lanes of Dota. This was a pretty technical one, thanks for hanging in there for it! And feel free to let us know what you’d like us to cover next.