By Sagar Shankaran, Founder of CallSphere
Citus turns Postgres into a distributed database via row-based or schema-based sharding. Pick the right model for AI workloads, distribute pgvector tables, and watch query plans actually parallelize.
Key takeaways
TL;DR — When a single Postgres node can't keep up — typically past 10 TB or 50k writes/sec — Citus shards the table across worker nodes while keeping the SQL surface intact. Schema-based sharding (Citus 12+) is the right default for multi-tenant AI workloads.
A 1 coordinator + 4 worker Citus cluster running pgvector across shards, with schema-per-tenant isolation and parallel ANN queries.
CREATE EXTENSION citus;
CREATE EXTENSION vector;
-- Schema-based sharding (Citus 12+)
CREATE SCHEMA tenant_acme;
CREATE TABLE tenant_acme.documents (
id BIGSERIAL PRIMARY KEY,
body TEXT,
embedding vector(1536)
);
SELECT citus_schema_distribute('tenant_acme');
-- Row-based sharding for shared tables
CREATE TABLE shared_events (
tenant_id UUID,
event_id UUID,
ts TIMESTAMPTZ,
payload JSONB,
PRIMARY KEY (tenant_id, event_id)
);
SELECT create_distributed_table('shared_events', 'tenant_id');
flowchart TD
APP[App] --> COORD[Coordinator]
COORD --> W1[Worker 1<br/>shards 0-7]
COORD --> W2[Worker 2<br/>shards 8-15]
COORD --> W3[Worker 3<br/>shards 16-23]
COORD --> W4[Worker 4<br/>shards 24-31]
W1 --> S1[(Shard data)]
W2 --> S2[(Shard data)]
W3 --> S3[(Shard data)]
W4 --> S4[(Shard data)]
# On coordinator + each worker
echo "shared_preload_libraries = 'citus'" >> postgresql.conf
sudo systemctl restart postgresql
# On coordinator
psql -c "CREATE EXTENSION citus;"
psql -c "SELECT citus_set_coordinator_host('coord.internal', 5432);"
psql -c "SELECT citus_add_node('worker1.internal', 5432);"
psql -c "SELECT citus_add_node('worker2.internal', 5432);"
psql -c "SELECT citus_add_node('worker3.internal', 5432);"
psql -c "SELECT citus_add_node('worker4.internal', 5432);"
-- Row-based: hash by tenant_id
SELECT create_distributed_table('shared_events', 'tenant_id');
-- Schema-based: each tenant gets a schema, distributed
CREATE SCHEMA tenant_acme;
-- tables created here...
SELECT citus_schema_distribute('tenant_acme');
CREATE INDEX documents_hnsw_idx
ON tenant_acme.documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Citus pushes the index down to every shard automatically.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
EXPLAIN SELECT id FROM tenant_acme.documents
ORDER BY embedding <=> $1::vector LIMIT 10;
For schema-based, the query routes to the worker that owns that schema. For row-based, the planner runs the query on every shard in parallel and merges results.
SELECT create_distributed_table('messages', 'tenant_id', colocate_with => 'shared_events');
-- Co-located join runs on each worker, no cross-node traffic
SELECT m.id, e.event_id FROM messages m
JOIN shared_events e USING (tenant_id, event_id)
WHERE m.tenant_id = $1;
SELECT citus_add_node('worker5.internal', 5432);
SELECT rebalance_table_shards('shared_events');
Online rebalance — no downtime, throttled to keep the workers responsive.
create_reference_table.CallSphere's largest verticals (Healthcare and OneRoof) currently run on dedicated single-node Postgres + pgvector — Citus is on the roadmap for when cross-tenant aggregations exceed 30 TB. Schema-based sharding maps cleanly onto our 115+ DB tables model, with Healthcare's HIPAA workload (Prisma healthcare_voice) keeping its own isolated cluster, OneRoof RLS preserved per shard, and UrackIT's Supabase + ChromaDB stack untouched. 37 agents · 90+ tools · 6 verticals. Plans: $149/$499/$1,499 — 14-day trial, 22% affiliate.
Q: Citus or vanilla read replicas? Replicas scale reads only. Citus scales writes and storage.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: Can I use Citus on managed Postgres? Azure Cosmos DB for PostgreSQL = managed Citus. Self-host elsewhere.
Q: Does pgvector work distributed? Yes — index per shard, parallel scan.
Q: Schema-based or row-based? Schema-based for clear tenant boundaries; row-based for high-cardinality shared workloads.
Q: When NOT to use Citus? Under ~10 TB and ~50k writes/sec, vertical scaling is cheaper and simpler.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Your agent's memory, embeddings, and conversation state all live in Postgres. Backups must include vector data and survive a full-region loss. Here's how CallSphere does PITR for 115+ tables.
Live news studios in 2026 deploy an AI fact-checker behind every anchor, validating claims against trusted sources and offering on-air corrections within 30 seconds. Here is the production stack.
Sharding patterns that hold up beyond 100M vectors. The 2026 designs for partition keys, replication, and rebalancing.
pgvector 0.9 brings hybrid search, binary vectors, and improved indexing primitives. Why Postgres-native vector is good enough for most teams in 2026 honestly.
Mem0 on a single Postgres + pgvector instance, end to end. Schema, indexes, and the queries that keep latency under 200ms even with millions of memory records.
pgvector 0.8 with binary quantization cut HNSW build time 150x and hits 471 QPS at 99% recall on 50M vectors. Here is the production tuning guide for Postgres-shop teams.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI