Skip to content
AI Infrastructure
AI Infrastructure12 min read0 views

Streaming Call Transcripts Into ClickHouse for Sub-Second AI Voice Analytics in 2026

ClickHouse 26.2 added time-based block flushing and ClickPipes covers Kafka, S3, and Postgres CDC natively. Here's how CallSphere streams 50k voice agent transcripts a day into a ClickHouse cluster with sub-second p95 query latency.

TL;DR — ClickHouse 26.2 ships time-based block flushing (input_format_max_block_wait_ms), ClickPipes ingests Kafka and S3 natively, and BigLake-style federation lets you join cold archive with hot transcripts. For AI voice analytics, that means a single SQL surface from "this call ended 800 ms ago" to "all calls last quarter" — at sub-second latency.

Why this pipeline

A voice agent stack that handles 10k–100k calls a day generates structured events that an OLTP database (Postgres) gets crushed by once you start running aggregations: average sentiment by hour, top-10 intents in the last 5 minutes, lead-score histogram by vertical, talk/listen ratio by agent. ClickHouse is the canonical OLAP fit. The 2026 question is no longer "ClickHouse or not" — it's how you stream into it without batching delays or async-insert footguns.

The 26.2 release closed the last gap: low-throughput feeds (< 100 rows/sec) used to wait minutes for the default 1M-row block to flush. Time-based flushing fixes that, so a regional pod with 30 calls per minute still gets 3-second visibility.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Architecture

flowchart LR
  Voice[Voice agent<br/>OpenAI Realtime API] -->|partial transcripts| Kafka[(Kafka topic<br/>call.transcript.partial)]
  Voice -->|final transcript| Kafka2[(Kafka topic<br/>call.completed)]
  Kafka -->|ClickPipes| CH[(ClickHouse Cloud<br/>transcripts table<br/>MergeTree, ORDER BY call_id, ts)]
  Kafka2 -->|ClickPipes| CH
  CH --> MV[Materialized view:<br/>sentiment_5min_rollup]
  CH --> Dash[Grafana / Metabase]
  CH --> AGT[Internal AI agent<br/>read-only ClickHouse user]

Partial transcripts land within 800 ms of the speech token; final transcripts land within 2s of call end. A materialized view keeps a 5-minute sentiment rollup hot for the supervisor dashboard.

CallSphere implementation

CallSphere runs 37 specialist agents across 6 verticals with 90+ tools and 115+ DB tables. Pricing is $149 Starter / $499 Growth / $1499 Scale with a 14-day trial and 22% affiliate program. Healthcare post-call analytics uses GPT-4o-mini to compute a sentiment score from -1.0 to 1.0 and a lead score from 0 to 100, written into the call_analytics table — all of it queryable from ClickHouse alongside transcripts. Browse plans at /pricing or take a /demo. Healthcare specifics live at /industries/healthcare.

Build steps with code

  1. Provision ClickHouse Cloud with a dedicated service for analytics; pick a region close to your agent pod.
  2. Create the table with a sane order key.
  3. Set up a Kafka topic call.transcript.partial with 12 partitions and 7-day retention.
  4. Wire ClickPipes in the ClickHouse Cloud UI: pick the Kafka source, select topic, map columns.
  5. Tune input_format_max_block_wait_ms=3000 for low-throughput regional pods.
  6. Add a materialized view for the supervisor rollup.
  7. Grant a read-only user to your internal AI agent for ad-hoc analytics.
CREATE TABLE call_transcripts (
  call_id     UUID,
  vertical    LowCardinality(String),
  speaker     LowCardinality(String),  -- 'agent' | 'caller'
  ts          DateTime64(3),
  text        String,
  sentiment   Float32,
  lead_score  UInt8,
  pii_redacted UInt8 DEFAULT 0
)
ENGINE = MergeTree
ORDER BY (call_id, ts)
PARTITION BY toYYYYMM(ts)
TTL ts + INTERVAL 365 DAY;

CREATE MATERIALIZED VIEW sentiment_5min_rollup
ENGINE = AggregatingMergeTree
ORDER BY (vertical, bucket)
AS SELECT
  vertical,
  toStartOfFiveMinute(ts) AS bucket,
  avgState(sentiment)     AS avg_sent,
  countState()            AS n
FROM call_transcripts
GROUP BY vertical, bucket;

Pitfalls

  • Naive INSERT per transcript chunk — you'll generate millions of tiny parts and hit the merge backlog. Always batch via ClickPipes or async_insert.
  • ORDER BY ts only — sparse index becomes useless for per-call lookups; lead with call_id.
  • No TTL — voice analytics balloons fast; set TTL ts + INTERVAL 365 DAY or your storage bill will.
  • Forgetting LowCardinality on vertical/speaker — 10x bigger storage on string columns with low cardinality.
  • Querying raw partials for dashboards — always go through a materialized view; the rollup is 100x cheaper.

FAQ

Why ClickHouse over Postgres + TimescaleDB? Once you cross 50M rows of transcripts, ClickHouse is 10–50x faster for analytical scans. Timescale is great up to 10M rows; past that, the columnar format wins.

Can we query the call recording itself? No — store the audio in S3 and put the S3 URL in ClickHouse. Use ClickHouse only for the structured transcript and metrics.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

How do we redact PII before it hits ClickHouse? Pipeline post #6 covers this. The short version: redact in Flink between Kafka and ClickHouse, never relying on ClickHouse to do it.

Latency goal? Sub-second p95 for dashboard queries; 3s ingest visibility for streaming feeds.

Multi-tenant? Yes — add a tenant_id column at the front of the order key and use row-policies for per-tenant isolation.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.