Yeah, but can it run DOOM?
This ubiquitous question is what I was asking myself a couple years ago at LiveKit's first hackathon. Could I play DOOM over LiveKit? Potentially ambitious for a 24-hour competition, but I had to solve this rite of passage. So, I partnered with our resident WebRTC expert, Raja, and we set off to build something great.
After a long night, it worked; some Frankenstein WASM setup that ran in the browser and allowed multiplayer on the original DOOM using LiveKit data channels. It was laggy as hell, but it worked! I don't think we even placed in the hackathon, and since I'd (mostly) solved the problem, the project was forgotten.
Several months later, I found myself thinking about all the progress made in Linux gaming with Valve and the Steam Deck and how some cloud-gaming services showed quite a bit of promise. Then I started to wonder ... could LiveKit power a cloud gaming service?
One obvious downside at the outset: successful cloud gaming services use peer-to-peer WebRTC, so there's no obvious reason to use LiveKit's SFU-centric model. But what if we colocated LiveKit with the game server to offset the latency issue? We could allow for a more interactive experience, with several players in the same LiveKit room, one or more able to control the game; it could even revive an old relic: split-screen co-op!
So maybe DOOM wasn't the goal this time; maybe it was something more interactive. Say something like Portal 2 co-op...
To even get to the point where cloud gaming over LiveKit was feasible, I had to figure out where to start; a huge boost came from the Headless Steam Service project. Tons of great info and even better code in that repo to help me hit the ground running.
So, I had a headless Steam Docker container that I could run on a VM. Next, I needed a GPU attached to my VM to power the graphics and use the encoder. I compared prices across many providers and found that an NVIDIA T4 GPU in GCPs us-west1-b region had the only availability at the time and was relatively cheap.
LiveKit's closest cloud endpoint is in San Francisco on the US West Coast, but I decided to start with this setup and tune it later if needed. The RTT between the two regions was about 35ms, so if inputs were stuttering or there were other latency issues, this would be the first place to tune by colocating with OSS LiveKit or moving the VM to a closer data center.
With headless Steam ready to go, I needed two more pieces: input controls and a way to capture the screen on the VM and pipe it back into the LiveKit room.
The controls were relatively straightforward. To achieve this, I used Javascript to capture the mouse, keyboard and gamepad inputs on the client-side browser and then sent those over LiveKit data channels. On the VM, I used the server-sdk-go to connect to the room and listen for inputs on the data channel. Once input was received, I used a fork of this archived repo to send inputs to virtual udev devices.
Capturing the screen was also fairly intuitive, thanks to the work done on the gstreamer LiveKit sink. I installed the Nvidia device drivers and Cuda libraries to ensure I could use hardware-assisted h264 encoding with the gstreamer plugin nvh264enc.
I tried several different software encoders with various tweaks but could never get the responsiveness that the hardware encoding offered. I also had to trim the pipeline to avoid buffering; dropped packets are preferred over late-arriving video. Shoutout to Theo for figuring this out.
I'm also (somewhat unfortunately) setting lower bit-rate and resolution in the gstreamer pipeline. Simulcast is not yet supported with the LiveKit gstreamer plugin, so I'm eagerly awaiting this pull request landing.
Now, I was ready to put it all together! For the front end, I just used LiveKit Meet that I hacked up to add the javascript and data channel management. I ran this on my VM, which was handling all the other tasks of encoding, headless Steam, and input management.
There was initially some stuttering with controls, and I had to spend a ton of time tweaking the gamepad dead zone settings to remove drift. Also, thanks to Jonas, who helped me figure out how to set the playout delay on the front-end client to zero to ensure controls were as responsive as possible.
I did a ton of "testing" (it's great when testing is actually just gaming), and I told Russ that my barometer for success was if I could get immersed in Portal 2 and forget about the fact it was running over LiveKit cloud and how the controls were working and if frames were being dropped.
This actually happened; I spent two hours running through a bunch of co-op levels with a friend and finally broke the immersion when we had to dance to exit one of the levels, but the button wasn't mapped for the controller inputs.
Plenty more work could still be done to make this more efficient. Simulcast would allow for higher resolutions. As mentioned earlier, colocating with LiveKit cloud would immediately remove about 35ms of lag.
However, I never got around to it because it worked well enough with this latency there. This is all packaged up into a Docker container, and I'd love to test it out on higher-performance machines, but most services that rent access to consumer GPUs don't allow privileged containers, so I'm stuck on an underpowered VM for now.
So, yeah, it worked. Portal 2 co-op over LiveKit with minimal jank. Could it be better? Sure—there’s still a lot of tuning and optimizing left on the table. But the fact that I could spend hours gaming with a friend and forget this was all running on a cloud setup powered by LiveKit? That feels like a win to me. If you want to check out the code, it's all open source here.
And yes, before you ask—I did fire up Doom just to make sure it worked. Heavy-metal riffs optional.