LiveKit’s Series B: Building the all-in-one platform for voice AI agents
In September 2023, together with OpenAI we unveiled ChatGPT Voice Mode. When OpenAI launched the new feature, we also released LiveKit Agents, an open source framework that made it easy for developers to build their own voice AI agents.
Back then Voice AI was not a thing — most investors we spoke to about our Series A said voice interfaces to AI models “were 3-5 years out”. Then the GPT-4o unveil happened. LiveKit Agents was used in every demo and seemingly overnight Voice AI became an industry.
There are now many teams using LiveKit Agents for voice-driven AI products, services, and interfaces. Hello Patient built a voice agent to manage numerous hospital workflows, Salient uses voice agents for loan servicing in the automotive industry, and Podium deploys AI employees within organizations for sales, scheduling, marketing, and customer support.
LiveKit Cloud, our ultra-low latency edge network that voice agents use to exchange audio data with users, has seen remarkable growth. The infrastructure supports over 100,000 developers, collectively doing over 3 billion calls per year.
Voice will become the default way we interact with computers and LiveKit is positioned to be the backbone of this paradigm shift.
Introducing LiveKit Agents 1.0
Agents 1.0 marks a key milestone in our journey towards giving developers everything they need to build high-quality, voice-driven AI applications. Among many new features like pipeline nodes, synchronized captioning, and client-agent RPC, we have some major updates.
Workflows
After speaking with hundreds of developers building voice agents, we’ve learned there are two broad classes: open-ended and closed loop agents.
Conversations with an open-ended voice agent can meander, covering a broad range of topics in no particular order. The agent only needs basic tools like function calling or RAG. Examples include ChatGPT Advanced Voice Mode (AVM) and Character Voice, Speak for immersive language learning, and Tinder for dating advice.
Closed loop voice agents operate differently. This type of agent is primarily targeted at replacing the IVR systems or human operators in a deterministic business process—80% of which are accessed via telephone—like customer support, patient intake at hospitals, debt collection, loan qualification, or shipment planning. Prior to workflows, a developer implementing a closed loop agent might try to describe the business process in a lengthy LLM system prompt paired with some function tools. Unfortunately, this doesn’t work well. LLMs are probabilistic computers and can’t (yet) reliably execute multi-step workflows.
LiveKit Agents 1.0 makes building closed loop voice agents much easier. We’ve redesigned the entire framework to be lower level and more flexible, allowing a developer to orchestrate multi-agent workflows that break otherwise complex system prompts into discrete subtasks.
Multilingual semantic turn detection
A few months ago we introduced our first open source model, trained in-house to improve the accuracy of turn detection: one of the hardest problems in voice AI. The model was only trained on written English and thus, could only make end-of-turn predictions for conversations in English.
Today we’re releasing a new, larger semantic turn detection model with multilingual capabilities. It runs inference on a CPU in under 25ms for a 100-token context and supports 13 languages: Chinese, Dutch, English, French, German, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. It can even be used for mixed-language conversations — where the end user or agent switch between multiple languages in a single conversation.
Telephony 1.0
We broke ground on our open source SIP telephony stack back in September 2023, right after the original ChatGPT Voice Mode launched. Our prediction was that voice AI would see rapid adoption in telecom, an industry where the only interface was voice. Since then we’ve continued improving the reliability and performance of SIP and added features like HD audio, DTMF, cold and warm transfers, and automatic noise cancellation.
LiveKit’s telephony stack now handles thousands of concurrent calls at any given moment. Even a quarter of 911 emergency dispatch centers in the US use LiveKit — which helps save at least one life every week. In that vein, we’re bumping SIP to 1.0 to reflect its maturity and robustness.
Cloud Agents
The very second you read this — there are hundreds of thousands of voice agents, running on LiveKit Cloud, having conversations with end users around the world. To support that level of scale requires a departure from the approach used for stateless web applications.
A voice agent is stateful — simultaneously running inference on a GPU while constantly listening to you speak and deciding whether you’re done expressing your thoughts or whether it should interrupt. The length of a conversation is irregular and just like most humans, a voice agent can’t hold multiple conversations at once.
Managing the lifecycle of an agent—location-aware elastic provisioning, load balancing, health-checking, transparent failover, context migration—is challenging to do efficiently and was something we helped OpenAI figure out for Voice Mode. Since launching the Agents framework, developers have asked us for a solution to handle agent deployment and scaling out of the box.
Today we’re starting a closed beta of our solution for agent deployment and scaling: LiveKit Cloud Agents. Vercel is to NextJS what Cloud Agents is to LiveKit’s Agents framework. We host your agent code in a secure container, deploy it across LiveKit Cloud’s network of data centers around the world, and manage the entire devops lifecycle for you — provisioning, load balancing, logging, versioning, rollbacks, etc. We’ve been dogfooding Cloud Agents internally for our own agents and found it significantly accelerates the production rollout process. We can’t wait for you to try it and to hear your feedback.
If you’d like to join the Cloud Agents close beta, please fill out this form.
LiveKit’s Series B financing
Just three years ago LiveKit was an open source project that made it easier to connect with one another during the pandemic. We’ve since evolved into something bigger and more impactful than we ever imagined, but there’s a lot more building to do ahead of us than behind.
With that in mind, we’re proud to share we’ve raised an additional $45m in financing, bringing our total capital raised to $83m. Our Series A leads, Jamin and Brad at Altimeter, saw the coming shift from keyboards and mice to cameras and mics long before anyone else. They’ve decided to double-down and lead LiveKit’s Series B. We’re also honored to have legendary infrastructure investor and operator, Mike Volpi, join this round in his first investment out of Hanabi Capital.
We plan to use this capital towards growing our team and furthering our progress towards offering an all-in-one platform for building AI agents that can see, hear, and speak like we do.
This post wouldn’t be complete without a deep expression of gratitude for the developer community using and contributing to LiveKit. All of the amazing things you build using this stack is what has and continues to energize us over the days, weeks, months, and years. We’re here for you and because of you.