Skip to content
AI Voice Agents
AI Voice Agents11 min read0 views

Build a Kotlin Android Voice Agent with the LiveKit SDK

Ship a native Android voice agent with Jetpack Compose and LiveKit's client SDK. Real working Kotlin code for permissions, room connect, and audio level animations.

TL;DR — LiveKit's Android SDK + Compose components let you ship a production voice agent UI in under 200 lines. The SDK handles WebRTC, ICE, audio focus, and Bluetooth routing.

What you'll build

A Compose-based Android app that joins a LiveKit room hosting a Python LiveKit Agent. The user holds-to-talk; the agent transcribes, calls a tool to look up a salon appointment, and speaks back. We add an audio-level orb that pulses with the agent's voice.

Prerequisites

  1. Android Studio Iguana+, Kotlin 2.0, Compose BOM 2026.04.00+.
  2. LiveKit Cloud project (free tier) or a self-hosted server.
  3. A Python or Node LiveKit Agent already deployed.
  4. A token-issuing endpoint on your backend.
  5. RECORD_AUDIO permission in AndroidManifest.xml.

Architecture

flowchart LR
  A[Android Compose app] -- LiveKit token --> S[Your /token endpoint]
  A -- WebRTC --> L[LiveKit Cloud]
  L -- WebRTC --> P[LiveKit Agent Python]
  P -- LLM --> O[OpenAI / Workers AI / etc.]

Step 1 — Add deps

In build.gradle.kts:

```kotlin dependencies { implementation("io.livekit:livekit-android:2.13.0") implementation("io.livekit:livekit-android-compose-components:2.1.3") implementation("androidx.compose.material3:material3") implementation("androidx.lifecycle:lifecycle-runtime-compose:2.8.0") } ```

Step 2 — Manifest permissions

```xml ```

Step 3 — Token fetch

```kotlin suspend fun fetchToken(): Pair<String, String> = withContext(Dispatchers.IO) { val client = OkHttpClient() val req = Request.Builder() .url("https://api.callsphere.ai/voice/token") .build() client.newCall(req).execute().use { res -> val json = JSONObject(res.body!!.string()) json.getString("url") to json.getString("token") } } ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 4 — Compose room screen

The RoomScope composable hides connection lifecycle. Inside it you get a Room you can call methods on.

```kotlin @Composable fun VoiceAgentScreen() { val context = LocalContext.current var creds by remember { mutableStateOf<Pair<String,String>?>(null) }

LaunchedEffect(Unit) {
    creds = fetchToken()
}

creds?.let { (url, token) ->
    RoomScope(url = url, token = token, audio = true, connect = true) {
        val room = RoomLocal.current
        val active = rememberParticipants()
            .firstOrNull { it.identity?.value?.startsWith("agent") == true }

        Column(Modifier.fillMaxSize().padding(24.dp)) {
            Text("CallSphere Salon Agent",
                style = MaterialTheme.typography.headlineSmall)
            AgentOrb(participant = active)
            Button(onClick = { runBlocking { room.disconnect() } }) {
                Text("Hang up")
            }
        }
    }
}

} ```

Step 5 — Audio orb that pulses with the agent

LiveKit emits an AudioTrackStats.audioLevel you can subscribe to:

```kotlin @Composable fun AgentOrb(participant: Participant?) { val level by participant ?.let { rememberAudioLevel(it) } ?: remember { mutableFloatStateOf(0f) } val scale by animateFloatAsState(targetValue = 1f + level * 1.5f, label="orb")

Box(
    Modifier
        .size(180.dp)
        .graphicsLayer { scaleX = scale; scaleY = scale }
        .clip(CircleShape)
        .background(MaterialTheme.colorScheme.primary)
)

} ```

Step 6 — Request mic permission

```kotlin val micPerm = rememberLauncherForActivityResult( ActivityResultContracts.RequestPermission() ) { granted -> if (!granted) Log.w("CallSphere", "mic denied") }

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

LaunchedEffect(Unit) { micPerm.launch(android.Manifest.permission.RECORD_AUDIO) } ```

Common pitfalls

  • Forgetting MODIFY_AUDIO_SETTINGS — calls connect but you hear nothing.
  • Doing token fetch on the main thread — use Dispatchers.IO.
  • Calling room.connect from outside RoomScope — leaks; let the composable own lifecycle.
  • Bluetooth routing — set AudioManager.MODE_IN_COMMUNICATION for headsets.

How CallSphere does this in production

CallSphere's salon CRM Android client uses LiveKit on top of NestJS + Prisma. The same backend powers our Real Estate and Healthcare verticals — 6 verticals, 37 agents, 115+ DB tables. Affiliate program pays 22% on referrals — see /affiliate.

FAQ

Why LiveKit and not raw WebRTC? TURN, SFU, codec negotiation, audio focus — LiveKit handles all of it.

Can I use Realtime directly without LiveKit? Yes, but you'd write your own PeerConnectionFactory glue.

Does it work on Android 13+ predictive back? Yes, the SDK observes lifecycle.

How big is the APK bloat? ~6MB for the LiveKit SDK + Compose components.

Can the agent be on Modal? Yes — see our Modal post in this series.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.