Introducing the Grok Voice Agent API in partnership with xAI

Introducing the Grok Voice Agent API in partnership with xAI

Every day, millions of people around the world talk to Grok via first-party apps and in Tesla vehicles. The underlying model that brings Grok to life is a voice-to-voice model which understands the expressive range of human speech, and can generate correspondingly expressive responses; it can laugh and whisper and sigh.

The model is able to do this because of how it internally, within a single model, processes speech (including paralinguistic cues) and generates expressive speech output. Another benefit of this approach is reduced latency. By processing speech input and output within a single model, Grok is able to reliably respond in less than 700 milliseconds. The net effect is an AI that feels more natural and humanlike to interact with.

In partnership with xAI, we’re excited to announce that you can now leverage the same technology stack powering the Grok voice experience in your own voice AI applications.

Using LiveKit with xAI’s Grok Voice Agent API

The Grok Voice Agent API is available today as a new plugin in LiveKit Agents for Python, with Node support planned in the future. In just a single line of code, you can create a custom voice agent that has the same expressiveness and speed as Grok Voice Mode:

from livekit.agents import AgentServer, AgentSession, Agent
from livekit.plugins import xai

from dotenv import load_dotenv
load_dotenv()

class GrokAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""You are a helpful voice AI assistant built.
            You were created by xAI and LiveKit.""",
        )

server = AgentServer()

@server.rtc_session()
async def request_handler(req: agents.JobContext):
	session = AgentSession(
		llm=xai.realtime.RealtimeModel(),
	)
	await session.start(room=req.room, agent=GrokAssistant())
	await session.generate_reply(instructions="Greet the user and offer to help.")

if __name__ == "__main__":
    agents.cli.run_app(server)

From here, assuming you have a LiveKit Cloud account and project set up, you can run and talk to this agent directly from your command line:

uv run agent.py console

Or, using the CLI, deploy it to LiveKit Cloud and have your agent frontend connect to it:

lk agent create

LiveKit’s plugin for the Grok Voice Agent API includes a handful of voice options, support for custom tool calling, and integrated turn detection that can be configured for your specific use case. Notably, xAI offers prebuilt tools allowing your voice agent to perform searches across the web, X posts, and custom document/file collections. Voice agents built with this API also work with the rest of the LiveKit ecosystem — you can bring toys to life by pairing your agent with our ESP32 SDK on the client side, or grab a phone number that people can use to call your agent.

Grok Voice Agent API Use Cases

There are numerous applications where a Grok Voice Agent is a game changer for its ability to process the nuances of human speech and respond appropriately, including:

  • Customer service: Grok already interacts with customers at large scale via Tesla and Starlink support lines. Your voice agent can now also handle your customer inquiries with similar empathy and understanding, detecting frustration or satisfaction in a caller's tone and adjusting responses on-the-fly.
  • Healthcare and therapy: Voice agents can provide companionship, coaching, mental health support, or patient intake interviews where emotional context and tone matter significantly.
  • Education and tutoring: A voice agent can adapt its teaching style based on a student's engagement level, confusion, or excitement. This is especially useful in language learning applications, where cultural elements are woven into how a particular language is spoken.
  • Sales and recruiting: Your voice-based GTM agents can conduct initial sales calls or interviews and qualify leads or candidates with the persuasiveness and rapport-building capabilities that come from understanding conversational intent.

We’re excited to see what you build with LiveKit and the new Grok Voice Agent API. If there’s any way we can help, please let us know in our Slack community and/or on X @livekit.