---
title: "Migrating Vector Databases: Moving Embeddings Between Pinecone, pgvector, and Weaviate"
description: "Learn how to migrate vector embeddings between Pinecone, pgvector, and Weaviate. Covers export formats, re-embedding decisions, index tuning, and verification strategies."
canonical: https://callsphere.ai/blog/migrating-vector-databases-pinecone-pgvector-weaviate-embeddings
category: "Learn Agentic AI"
tags: ["Vector Database", "Pinecone", "pgvector", "Weaviate", "Embeddings", "Migration"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.670Z
---

# Migrating Vector Databases: Moving Embeddings Between Pinecone, pgvector, and Weaviate

> Learn how to migrate vector embeddings between Pinecone, pgvector, and Weaviate. Covers export formats, re-embedding decisions, index tuning, and verification strategies.

## When Vector Database Migration Makes Sense

Teams migrate vector databases for several reasons: cost optimization (Pinecone's managed pricing vs. self-hosted pgvector), consolidation (reducing infrastructure complexity by using pgvector alongside your existing PostgreSQL), or capability requirements (Weaviate's hybrid search combining vectors with BM25 keyword matching).

The critical decision in any vector migration is whether to copy existing embeddings or re-embed from source documents. This choice affects migration time, cost, and whether you can change embedding models simultaneously.

## Decision: Copy Vectors or Re-Embed?

```python
def should_re_embed(
    source_model: str,
    target_model: str,
    source_dimensions: int,
    target_dimensions: int,
    document_count: int,
) -> dict:
    """Decide whether to copy vectors or re-embed."""
    must_re_embed = (
        source_model != target_model
        or source_dimensions != target_dimensions
    )

    # Estimate re-embedding cost (OpenAI text-embedding-3-small)
    avg_tokens_per_doc = 500
    cost_per_million_tokens = 0.02
    estimated_cost = (
        document_count * avg_tokens_per_doc / 1_000_000
        * cost_per_million_tokens
    )

    return {
        "re_embed_required": must_re_embed,
        "reason": (
            "Model or dimension mismatch"
            if must_re_embed
            else "Same model, direct copy possible"
        ),
        "estimated_cost_usd": round(estimated_cost, 2),
        "estimated_time_minutes": round(document_count / 2000, 1),
    }

result = should_re_embed(
    source_model="text-embedding-ada-002",
    target_model="text-embedding-3-small",
    source_dimensions=1536,
    target_dimensions=1536,
    document_count=100_000,
)
print(result)
# Model mismatch -> must re-embed
```

## Exporting from Pinecone

```python
from pinecone import Pinecone

def export_from_pinecone(
    api_key: str,
    index_name: str,
    namespace: str = "",
    batch_size: int = 100,
) -> list[dict]:
    """Export all vectors and metadata from a Pinecone index."""
    pc = Pinecone(api_key=api_key)
    index = pc.Index(index_name)

    stats = index.describe_index_stats()
    total = stats.total_vector_count
    print(f"Exporting {total} vectors from Pinecone")

    all_vectors = []
    # Use list endpoint to get all IDs, then fetch in batches
    for ids_batch in index.list(namespace=namespace):
        fetch_result = index.fetch(ids=ids_batch, namespace=namespace)
        for vec_id, vec_data in fetch_result.vectors.items():
            all_vectors.append({
                "id": vec_id,
                "values": vec_data.values,
                "metadata": vec_data.metadata,
            })

    print(f"Exported {len(all_vectors)} vectors")
    return all_vectors
```

## Importing into pgvector

```python
import asyncpg
import json

async def import_to_pgvector(
    vectors: list[dict],
    db_url: str,
    table_name: str = "embeddings",
    dimensions: int = 1536,
):
    """Import vectors into a pgvector table."""
    conn = await asyncpg.connect(db_url)

    # Ensure extension and table exist
    await conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
    await conn.execute(f"""
        CREATE TABLE IF NOT EXISTS {table_name} (
            id TEXT PRIMARY KEY,
            embedding vector({dimensions}),
            metadata JSONB,
            created_at TIMESTAMPTZ DEFAULT now()
        )
    """)

    # Batch insert
    imported = 0
    for vec in vectors:
        embedding_str = "[" + ",".join(str(v) for v in vec["values"]) + "]"
        await conn.execute(
            f"""INSERT INTO {table_name} (id, embedding, metadata)
                VALUES ($1, $2::vector, $3::jsonb)
                ON CONFLICT (id) DO NOTHING""",
            vec["id"],
            embedding_str,
            json.dumps(vec.get("metadata", {})),
        )
        imported += 1

    # Create HNSW index for fast similarity search
    await conn.execute(f"""
        CREATE INDEX IF NOT EXISTS idx_{table_name}_embedding
        ON {table_name}
        USING hnsw (embedding vector_cosine_ops)
        WITH (m = 16, ef_construction = 200)
    """)

    await conn.close()
    print(f"Imported {imported} vectors into pgvector")
```

## Re-Embedding When Models Change

```python
from openai import OpenAI

client = OpenAI()

def re_embed_documents(
    documents: list[dict],
    model: str = "text-embedding-3-small",
    batch_size: int = 100,
) -> list[dict]:
    """Re-embed documents with a new model."""
    results = []
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        texts = [doc["text"] for doc in batch]

        response = client.embeddings.create(
            model=model,
            input=texts,
        )
        for doc, emb in zip(batch, response.data):
            results.append({
                "id": doc["id"],
                "values": emb.embedding,
                "metadata": doc.get("metadata", {}),
            })
    return results
```

## Verification: Ensure Search Quality Is Preserved

```python
async def verify_migration(
    test_queries: list[str],
    source_search_fn,
    target_search_fn,
    top_k: int = 10,
) -> dict:
    """Compare search results between source and target."""
    overlap_scores = []

    for query in test_queries:
        source_ids = set(source_search_fn(query, top_k))
        target_ids = set(target_search_fn(query, top_k))

        overlap = len(source_ids & target_ids) / top_k
        overlap_scores.append(overlap)

    avg_overlap = sum(overlap_scores) / len(overlap_scores)
    return {
        "avg_result_overlap": round(avg_overlap, 3),
        "queries_tested": len(test_queries),
        "perfect_matches": sum(1 for s in overlap_scores if s == 1.0),
    }
```

## FAQ

### Can I copy embeddings directly between different vector databases?

Yes, if you are keeping the same embedding model. Vectors are just arrays of floats — the database does not care which model produced them. Export the vectors with their metadata and import them into the new database. The key constraint is that dimensions must match.

```mermaid
flowchart TD
    DOC(["Document"])
    CHUNK["Chunker
recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata
source, page, tenant"]
    INDEX[("HNSW or IVF index
in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search
cosine similarity"]
    FILTER["Metadata filter
tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
```

### How long does re-embedding 1 million documents take?

With OpenAI's embedding API at roughly 2,000 documents per minute (respecting rate limits), re-embedding 1 million documents takes about 8-9 hours. You can parallelize with multiple API keys or use a local model like BAAI/bge-large-en to eliminate rate limits entirely.

### Should I tune HNSW index parameters after migration?

Yes. The default parameters (m=16, ef_construction=64) work for most cases, but if you need higher recall, increase ef_construction to 200 and m to 24. Run benchmark queries with different ef_search values to find the right recall-speed tradeoff for your use case.

---

#VectorDatabase #Pinecone #Pgvector #Weaviate #Embeddings #Migration #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/migrating-vector-databases-pinecone-pgvector-weaviate-embeddings
