Build a Kotlin Android Voice Agent with the LiveKit SDK
Ship a native Android voice agent with Jetpack Compose and LiveKit's client SDK. Real working Kotlin code for permissions, room connect, and audio level animations.
TL;DR — LiveKit's Android SDK + Compose components let you ship a production voice agent UI in under 200 lines. The SDK handles WebRTC, ICE, audio focus, and Bluetooth routing.
What you'll build
A Compose-based Android app that joins a LiveKit room hosting a Python LiveKit Agent. The user holds-to-talk; the agent transcribes, calls a tool to look up a salon appointment, and speaks back. We add an audio-level orb that pulses with the agent's voice.
Prerequisites
- Android Studio Iguana+, Kotlin 2.0, Compose BOM 2026.04.00+.
- LiveKit Cloud project (free tier) or a self-hosted server.
- A Python or Node LiveKit Agent already deployed.
- A token-issuing endpoint on your backend.
RECORD_AUDIOpermission inAndroidManifest.xml.
Architecture
flowchart LR
A[Android Compose app] -- LiveKit token --> S[Your /token endpoint]
A -- WebRTC --> L[LiveKit Cloud]
L -- WebRTC --> P[LiveKit Agent Python]
P -- LLM --> O[OpenAI / Workers AI / etc.]
Step 1 — Add deps
In build.gradle.kts:
```kotlin dependencies { implementation("io.livekit:livekit-android:2.13.0") implementation("io.livekit:livekit-android-compose-components:2.1.3") implementation("androidx.compose.material3:material3") implementation("androidx.lifecycle:lifecycle-runtime-compose:2.8.0") } ```
Step 2 — Manifest permissions
```xml
Step 3 — Token fetch
```kotlin suspend fun fetchToken(): Pair<String, String> = withContext(Dispatchers.IO) { val client = OkHttpClient() val req = Request.Builder() .url("https://api.callsphere.ai/voice/token") .build() client.newCall(req).execute().use { res -> val json = JSONObject(res.body!!.string()) json.getString("url") to json.getString("token") } } ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 4 — Compose room screen
The RoomScope composable hides connection lifecycle. Inside it you get a Room you can call methods on.
```kotlin @Composable fun VoiceAgentScreen() { val context = LocalContext.current var creds by remember { mutableStateOf<Pair<String,String>?>(null) }
LaunchedEffect(Unit) {
creds = fetchToken()
}
creds?.let { (url, token) ->
RoomScope(url = url, token = token, audio = true, connect = true) {
val room = RoomLocal.current
val active = rememberParticipants()
.firstOrNull { it.identity?.value?.startsWith("agent") == true }
Column(Modifier.fillMaxSize().padding(24.dp)) {
Text("CallSphere Salon Agent",
style = MaterialTheme.typography.headlineSmall)
AgentOrb(participant = active)
Button(onClick = { runBlocking { room.disconnect() } }) {
Text("Hang up")
}
}
}
}
} ```
Step 5 — Audio orb that pulses with the agent
LiveKit emits an AudioTrackStats.audioLevel you can subscribe to:
```kotlin @Composable fun AgentOrb(participant: Participant?) { val level by participant ?.let { rememberAudioLevel(it) } ?: remember { mutableFloatStateOf(0f) } val scale by animateFloatAsState(targetValue = 1f + level * 1.5f, label="orb")
Box(
Modifier
.size(180.dp)
.graphicsLayer { scaleX = scale; scaleY = scale }
.clip(CircleShape)
.background(MaterialTheme.colorScheme.primary)
)
} ```
Step 6 — Request mic permission
```kotlin val micPerm = rememberLauncherForActivityResult( ActivityResultContracts.RequestPermission() ) { granted -> if (!granted) Log.w("CallSphere", "mic denied") }
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
LaunchedEffect(Unit) { micPerm.launch(android.Manifest.permission.RECORD_AUDIO) } ```
Common pitfalls
- Forgetting
MODIFY_AUDIO_SETTINGS— calls connect but you hear nothing. - Doing token fetch on the main thread — use
Dispatchers.IO. - Calling
room.connectfrom outside RoomScope — leaks; let the composable own lifecycle. - Bluetooth routing — set
AudioManager.MODE_IN_COMMUNICATIONfor headsets.
How CallSphere does this in production
CallSphere's salon CRM Android client uses LiveKit on top of NestJS + Prisma. The same backend powers our Real Estate and Healthcare verticals — 6 verticals, 37 agents, 115+ DB tables. Affiliate program pays 22% on referrals — see /affiliate.
FAQ
Why LiveKit and not raw WebRTC? TURN, SFU, codec negotiation, audio focus — LiveKit handles all of it.
Can I use Realtime directly without LiveKit? Yes, but you'd write your own PeerConnectionFactory glue.
Does it work on Android 13+ predictive back? Yes, the SDK observes lifecycle.
How big is the APK bloat? ~6MB for the LiveKit SDK + Compose components.
Can the agent be on Modal? Yes — see our Modal post in this series.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.