TL;DR — When a single Postgres node can't keep up — typically past 10 TB or 50k writes/sec — Citus shards the table across worker nodes while keeping the SQL surface intact. Schema-based sharding (Citus 12+) is the right default for multi-tenant AI workloads.

What you'll build

A 1 coordinator + 4 worker Citus cluster running pgvector across shards, with schema-per-tenant isolation and parallel ANN queries.

Schema

CREATE EXTENSION citus;
CREATE EXTENSION vector;

-- Schema-based sharding (Citus 12+)
CREATE SCHEMA tenant_acme;
CREATE TABLE tenant_acme.documents (
  id BIGSERIAL PRIMARY KEY,
  body TEXT,
  embedding vector(1536)
);
SELECT citus_schema_distribute('tenant_acme');

-- Row-based sharding for shared tables
CREATE TABLE shared_events (
  tenant_id UUID,
  event_id UUID,
  ts TIMESTAMPTZ,
  payload JSONB,
  PRIMARY KEY (tenant_id, event_id)
);
SELECT create_distributed_table('shared_events', 'tenant_id');

Architecture

flowchart TD
  APP[App] --> COORD[Coordinator]
  COORD --> W1[Worker 1<br/>shards 0-7]
  COORD --> W2[Worker 2<br/>shards 8-15]
  COORD --> W3[Worker 3<br/>shards 16-23]
  COORD --> W4[Worker 4<br/>shards 24-31]
  W1 --> S1[(Shard data)]
  W2 --> S2[(Shard data)]
  W3 --> S3[(Shard data)]
  W4 --> S4[(Shard data)]

Step 1 — Cluster setup

# On coordinator + each worker
echo "shared_preload_libraries = 'citus'" >> postgresql.conf
sudo systemctl restart postgresql

# On coordinator
psql -c "CREATE EXTENSION citus;"
psql -c "SELECT citus_set_coordinator_host('coord.internal', 5432);"
psql -c "SELECT citus_add_node('worker1.internal', 5432);"
psql -c "SELECT citus_add_node('worker2.internal', 5432);"
psql -c "SELECT citus_add_node('worker3.internal', 5432);"
psql -c "SELECT citus_add_node('worker4.internal', 5432);"

Step 2 — Distribute the table

-- Row-based: hash by tenant_id
SELECT create_distributed_table('shared_events', 'tenant_id');

-- Schema-based: each tenant gets a schema, distributed
CREATE SCHEMA tenant_acme;
-- tables created here...
SELECT citus_schema_distribute('tenant_acme');

Step 3 — Build pgvector indexes per shard

CREATE INDEX documents_hnsw_idx
ON tenant_acme.documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Citus pushes the index down to every shard automatically.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 4 — Parallel ANN queries

EXPLAIN SELECT id FROM tenant_acme.documents
ORDER BY embedding <=> $1::vector LIMIT 10;

For schema-based, the query routes to the worker that owns that schema. For row-based, the planner runs the query on every shard in parallel and merges results.

Step 5 — Co-located joins

SELECT create_distributed_table('messages', 'tenant_id', colocate_with => 'shared_events');

-- Co-located join runs on each worker, no cross-node traffic
SELECT m.id, e.event_id FROM messages m
JOIN shared_events e USING (tenant_id, event_id)
WHERE m.tenant_id = $1;

Step 6 — Rebalance shards

SELECT citus_add_node('worker5.internal', 5432);
SELECT rebalance_table_shards('shared_events');

Online rebalance — no downtime, throttled to keep the workers responsive.

Pitfalls

Cross-shard transactions — possible but slower. Co-locate to avoid them.
Foreign keys cross shards — only allowed within a co-location group.
Schema-based with too few tenants — overhead beats benefit under ~20 tenants.
Forgetting to distribute reference tables — small lookup tables should use create_reference_table.
SEQUENCE on non-distribution column — use BIGSERIAL with care; for global uniqueness consider UUIDs.

CallSphere production note

CallSphere's largest verticals (Healthcare and OneRoof) currently run on dedicated single-node Postgres + pgvector — Citus is on the roadmap for when cross-tenant aggregations exceed 30 TB. Schema-based sharding maps cleanly onto our 115+ DB tables model, with Healthcare's HIPAA workload (Prisma healthcare_voice) keeping its own isolated cluster, OneRoof RLS preserved per shard, and UrackIT's Supabase + ChromaDB stack untouched. 37 agents · 90+ tools · 6 verticals. Plans: $149/$499/$1,499 — 14-day trial, 22% affiliate.

FAQ

Q: Citus or vanilla read replicas? Replicas scale reads only. Citus scales writes and storage.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: Can I use Citus on managed Postgres? Azure Cosmos DB for PostgreSQL = managed Citus. Self-host elsewhere.

Q: Does pgvector work distributed? Yes — index per shard, parallel scan.

Q: Schema-based or row-based? Schema-based for clear tenant boundaries; row-based for high-cardinality shared workloads.

Q: When NOT to use Citus? Under ~10 TB and ~50k writes/sec, vertical scaling is cheaper and simpler.

Postgres + Citus for Sharding: Scale AI Workloads Past 10TB (2026)

What you'll build

Schema

Architecture

Step 1 — Cluster setup

Step 2 — Distribute the table

Step 3 — Build pgvector indexes per shard

Step 4 — Parallel ANN queries

Step 5 — Co-located joins

Step 6 — Rebalance shards

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Database Backup and Recovery for AI Agent State: Postgres + pgvector

WebRTC + AI Fact-Checker for Live News Studio Broadcasts in 2026

Vector DB Sharding Strategies for Hundreds of Millions of Vectors

pgvector 0.9: Hybrid Search and Binary Vectors Now Production-Ready

Self-Hosting Mem0 on Postgres + pgvector: A Production Recipe

pgvector at Scale in 2026: HNSW Tuning + Binary Quantization