Kafka 4.0 for AI Call Analytics Fan-Out: KRaft, Share Groups, Eligible Leader Replicas
Kafka 4.0 ditches ZooKeeper, ships share groups (KIP-932), and previews Eligible Leader Replicas. Here's how we fan out 50k AI call events per minute to analytics, billing, CRM, and ML training.
TL;DR — Kafka 4.0 is the first major release with zero ZooKeeper (KRaft default), KIP-848 next-gen consumer rebalance GA, KIP-932 share groups (queue semantics on a log), and KIP-966 Eligible Leader Replicas in preview. For AI call analytics, this is the difference between a 30-second rebalance stall and an invisible reassignment.
The pattern
When an AI voice agent finishes a call, ten downstream systems want to know: billing, CRM, the embedding pipeline, the QA dashboard, the recording archiver, the ML training set, the affiliate attribution job, the SLA monitor, the fraud detector, and the founder's Slack. Kafka is the canonical fan-out backbone: produce once, consume independently, replay on demand.
Kafka 4.0 (March 2025, mainline through 2026) closed the last operator-pain gaps: KRaft is default, KIP-848 makes consumer rebalances incremental rather than stop-the-world, and ELR keeps a min.in-sync subset that's safe to elect even when the ISR shrinks.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
How it works (architecture)
flowchart LR
Voice[Voice agent] -->|produce call.completed| K[(Kafka 4.0<br/>KRaft cluster<br/>3 brokers)]
K --> CG1[Consumer group: billing]
K --> CG2[Consumer group: crm-sync]
K --> CG3[Consumer group: embeddings]
K --> CG4[Consumer group: qa-dashboard]
K --> CG5[Share group: ml-training]
CG3 --> Vec[(Vector DB)]
CG2 --> CRM[(HubSpot/Salesforce)]
CG5 --> Train[Training pipeline]
Each consumer group reads independently with its own offset. Share groups (KIP-932) layer queue semantics on top of the log so workloads that don't want partition-pinned ordering get cooperative consumption.
CallSphere implementation
CallSphere fans out call.completed events from all 6 verticals (Real Estate, Healthcare, IT Services, Salon, After-hours, Sales) into a single Kafka topic with 24 partitions. The Real Estate OneRoof pod uses NATS internally for the agent bus, then the orchestrator emits a final call.completed event into Kafka for analytics. Other verticals follow the same pattern. Pricing $149/$499/$1499; 14-day trial, 22% affiliate. 37 agents · 90+ tools · 115+ DB tables.
Build steps with code
- Bootstrap KRaft cluster:
bin/kafka-storage.sh format -t $UUID -c kraft.properties. - Create the topic with proper retention:
bin/kafka-topics.sh --create --topic call.completed --partitions 24 --replication-factor 3 --config retention.ms=604800000. - Set min.insync.replicas=2 and
acks=allon producers for ELR safety. - Use the new consumer group protocol:
group.protocol=consumer(KIP-848). - Schema-registry the payload (CloudEvents 1.0 envelope, see post #12).
- Wire a Kafka Connect sink for ClickHouse and S3 archival.
- Monitor under-replicated partitions and consumer lag in Grafana.
from confluent_kafka import Producer, Consumer
p = Producer({
"bootstrap.servers": "kafka1:9092,kafka2:9092,kafka3:9092",
"acks": "all",
"enable.idempotence": True,
"compression.type": "zstd",
})
p.produce(
topic="call.completed",
key=call_id.encode(),
value=cloudevent_envelope_json,
headers=[("ce-type", b"com.callsphere.call.completed.v1")],
)
p.flush()
c = Consumer({
"bootstrap.servers": "kafka1:9092",
"group.id": "embeddings",
"group.protocol": "consumer", # KIP-848 next-gen
"auto.offset.reset": "earliest",
})
c.subscribe(["call.completed"])
while True:
msg = c.poll(1.0)
if msg and not msg.error():
upsert_embedding(msg.value())
c.commit(msg)
Common pitfalls
- Partition count too low — 6 partitions caps you at 6 parallel consumers per group; pick partitions for your peak parallelism, not your current load.
- Acks=1 + idempotence=false — silent data loss on broker failover.
- One topic per event type explosion — group related events under
call.*topics with header-based filtering downstream. - Skipping the new consumer protocol — old protocol still works but rebalances stall on every deploy.
- Treating retention.ms as backup — Kafka is a buffer, not a database; tier to S3 with KIP-405.
FAQ
Do we still need ZooKeeper? No. Kafka 4.0 only runs in KRaft mode. Migrating clusters: do the migration on 3.x first, then upgrade to 4.0.
Share groups vs consumer groups? Share groups (KIP-932) are queue-like — multiple consumers ack individual messages, no partition pinning. Use for unordered work like ML inference.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
When are ELRs GA? Preview in 4.0, expected GA in a 4.x release. Until then, treat them as a safety preview.
How does CallSphere price this? Kafka costs land in our infra, not customer pricing — see /pricing for $149/$499/$1499 plans.
Can I demo a Kafka-fed analytics dashboard? Yes — book a demo.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.