Deploy and scale agents on LiveKit Cloud

Russ d'Sa

Aug 19, 2025 • 3 min read

Over the past two years, as more teams pushed LiveKit voice agents to production, the same questions kept coming up:

How much CPU and memory do I allocate to my agent pools?
How do I handle sudden traffic spikes?
How can I instrument and optimize performance across sessions?

Our Agents framework makes building voice or video agents easy, but operating agents at scale is still hard. Today, we’re making it simple.

0:00

/1:57

For a step-by-step tutorial on how to deploy a simple voice assistant to LiveKit Cloud, follow our Voice AI quickstart.

When you deploy and run your agent on LiveKit Cloud, we handle all the operational challenges for you:

Stateful load balancing. Unlike an HTTP request, a live session with a voice or video agent can last for minutes or hours. To preserve state and stream timing, an agent should stay on the same server for the duration of a session. LiveKit Cloud’s agent scheduler is built for long-lived sessions. We place sessions by effective load, not connection count—factoring in user locality, CPU/GPU, memory, and network conditions to minimize end-to-end latency and jitter.
Capacity management. Agents are resource-heavy; you can’t run hundreds of active voice calls on a single machine. It’s on the order of tens, which leaves less buffer to absorb spikes. Each agent can also stress underlying resources differently; a voice agent may run a smaller model for noise cancellation, while a video agent might use a browser to perform an action. Agents deployed on LiveKit Cloud elastically scale to serve new requests. When an existing server nears its limit, we automatically spin up and steer inbound traffic towards new instances.
Draining and instant rollbacks. When you update your agent code and deploy a new version, you’ll want to avoid any service disruptions or downtime. LiveKit Cloud will gracefully drain an agent server during a version update, allowing existing calls to complete while blocking that server from taking any new sessions. If you notice anything unexpected in an update, you can instantly roll your agent back to a previous version with a single command.
Operational observability. To quickly iterate on your agents, you need to understand how your users are interacting with them. There’s a new section in your LiveKit Cloud dashboard which provides an initial set of agent observability tools. For each agent deployment, you can view session analytics, build logs, and quality metrics like uptime, session start latency, and resource consumption.

Our philosophy has always been to do undifferentiated heavy lifting for you and charge less than it would cost you to do it yourself. When you deploy an agent on LiveKit Cloud, every minute your agent is actively serving a user costs $0.01 per minute. We call it an agent session minute. This price encompasses hosting your agents across our global network of data centers, unlimited data transfer to and from your agent, and all observability and analytics features. For paid plans, agents are always warm and ready to take incoming connections.

Our customers are already running amazing use cases in production: agents that provide mental health support, agents that tutor students on difficult subjects, agents that help patients schedule appointments with doctors, even agents that triage 911 emergency calls.

Enabling teams to deploy and run agents on LiveKit Cloud is a big step towards providing an end-to-end platform for voice, video, and physical AI applications. In the months ahead, we have new features and capabilities planned that will make building, testing, deploying, scaling, and monitoring your agents—every phase of the software development lifecycle—feel like one, seamless process.

Stay tuned, and as always, we’d love to hear about what you’re building and how we can help. You can always reach us anytime in Slack.

Deploy and scale agents on LiveKit Cloud

Russ d'Sa

Sign up for more like this.

Series C: Towards the voice-driven era of computing

LiveKit SDK for ESP32: bringing voice AI to embedded devices

Introducing the Grok Voice Agent API in partnership with xAI