---
title: "Embeddings and Vector Representations: How LLMs Understand Meaning"
description: "Learn what embeddings are, how they capture semantic meaning as vectors, how to use embedding models for search and clustering, and the role cosine similarity plays in AI applications."
canonical: https://callsphere.ai/blog/embeddings-vector-representations-how-llms-understand-meaning
category: "Learn Agentic AI"
tags: ["Embeddings", "Vector Search", "Cosine Similarity", "Semantic Search", "RAG"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T09:33:30.609Z
---

# Embeddings and Vector Representations: How LLMs Understand Meaning

> Learn what embeddings are, how they capture semantic meaning as vectors, how to use embedding models for search and clustering, and the role cosine similarity plays in AI applications.

## What Are Embeddings?

An embedding is a dense vector of floating-point numbers that represents the meaning of a piece of text. Instead of treating words as arbitrary symbols, embeddings place them in a continuous mathematical space where similar meanings are close together and different meanings are far apart.

The sentence "The dog chased the cat" and "A canine pursued the feline" would have very similar embedding vectors, even though they share no words in common. This is the foundation of semantic search, RAG, recommendation systems, and many other AI applications.

## From Words to Vectors

Every LLM starts by converting tokens into embedding vectors. This is the very first operation in the model — before any attention or computation happens:

```mermaid
flowchart LR
    Q(["User query"])
    EMB["Embed query
text-embedding-3"]
    VEC[("Vector DB
pgvector or Pinecone")]
    RET["Top-k retrieval
k = 8"]
    PROMPT["Augmented prompt
system plus context"]
    LLM["LLM generation
Claude or GPT"]
    CITE["Inline citations
and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
import numpy as np

# Simplified embedding lookup
# In a real model, these are learned during training
vocab = {"the": 0, "cat": 1, "sat": 2, "on": 3, "mat": 4}
embedding_dim = 4

# Embedding matrix: vocab_size x embedding_dim
# Each row is the learned vector for one token
embedding_matrix = np.array([
    [0.2, 0.1, -0.3, 0.5],   # "the"
    [0.8, -0.2, 0.6, 0.1],   # "cat"
    [-0.1, 0.7, 0.3, -0.4],  # "sat"
    [0.3, 0.0, -0.1, 0.8],   # "on"
    [0.5, 0.4, -0.2, 0.3],   # "mat"
])

def embed_tokens(tokens, embedding_matrix, vocab):
    """Look up embedding vectors for a sequence of tokens."""
    indices = [vocab[t] for t in tokens]
    return embedding_matrix[indices]  # Shape: (seq_len, embedding_dim)

sentence = ["the", "cat", "sat", "on", "the", "mat"]
embeddings = embed_tokens(sentence, embedding_matrix, vocab)
print(f"Shape: {embeddings.shape}")  # (6, 4) — 6 tokens, 4 dimensions each
```

In production models, the embedding dimension is much larger — 1536 for OpenAI's text-embedding-3-small, 3072 for text-embedding-3-large. These higher dimensions allow the model to capture more nuanced distinctions in meaning.

## Using Embedding Models in Practice

Modern embedding models convert entire passages of text into a single vector that captures the overall meaning:

```python
from openai import OpenAI

client = OpenAI()

def get_embedding(text: str, model: str = "text-embedding-3-small") -> list[float]:
    """Get the embedding vector for a piece of text."""
    response = client.embeddings.create(
        input=text,
        model=model,
    )
    return response.data[0].embedding

# Embed some sample texts
texts = [
    "How to train a machine learning model",
    "Steps for building an ML pipeline",
    "Best Italian restaurants in New York",
    "NYC dining guide for pasta lovers",
    "Understanding neural network backpropagation",
]

embeddings = [get_embedding(text) for text in texts]

print(f"Embedding dimension: {len(embeddings[0])}")  # 1536
print(f"Number of texts embedded: {len(embeddings)}")
```

## Cosine Similarity: Measuring Meaning Distance

The standard way to compare embeddings is cosine similarity. It measures the angle between two vectors, ignoring their magnitude. Values range from -1 (opposite meaning) to 1 (identical meaning):

```python
import numpy as np

def cosine_similarity(a, b):
    """Compute cosine similarity between two vectors."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare all pairs of our sample texts
print("Similarity matrix:")
print(f"{'':>45s}", end="")
for i in range(len(texts)):
    print(f"  [{i}]", end="")
print()

for i, text_a in enumerate(texts):
    print(f"[{i}] {text_a[:40]:>42s}", end="")
    for j, text_b in enumerate(texts):
        sim = cosine_similarity(embeddings[i], embeddings[j])
        print(f" {sim:.2f}", end="")
    print()

# Expected results:
# [0] and [1] — high similarity (both about ML training)
# [2] and [3] — high similarity (both about NYC restaurants)
# [0] and [2] — low similarity (ML vs restaurants)
```

## Building a Semantic Search Engine

Embeddings power semantic search — finding documents by meaning rather than keyword matching. Here is a complete implementation:

```python
import numpy as np
from openai import OpenAI

client = OpenAI()

class SemanticSearchEngine:
    """Simple in-memory semantic search using OpenAI embeddings."""

    def __init__(self, model="text-embedding-3-small"):
        self.model = model
        self.documents = []
        self.embeddings = []

    def add_documents(self, documents: list[str]):
        """Embed and index a list of documents."""
        response = client.embeddings.create(
            input=documents,
            model=self.model,
        )
        new_embeddings = [item.embedding for item in response.data]

        self.documents.extend(documents)
        self.embeddings.extend(new_embeddings)

    def search(self, query: str, top_k: int = 3) -> list[dict]:
        """Find the most semantically similar documents."""
        # Embed the query
        query_embedding = client.embeddings.create(
            input=query,
            model=self.model,
        ).data[0].embedding

        # Compute similarity against all documents
        similarities = [
            cosine_similarity(query_embedding, doc_emb)
            for doc_emb in self.embeddings
        ]

        # Return top-k results
        ranked = sorted(
            enumerate(similarities),
            key=lambda x: x[1],
            reverse=True,
        )[:top_k]

        return [
            {"document": self.documents[idx], "score": score}
            for idx, score in ranked
        ]

# Usage
engine = SemanticSearchEngine()
engine.add_documents([
    "Python is a high-level programming language known for readability.",
    "JavaScript runs in web browsers and on Node.js servers.",
    "PostgreSQL is an advanced open-source relational database.",
    "Redis is an in-memory data structure store used as cache.",
    "Docker packages applications into portable containers.",
    "Kubernetes orchestrates container deployment at scale.",
])

results = engine.search("How do I store data efficiently?")
for r in results:
    print(f"  Score: {r['score']:.3f} | {r['document']}")
```

## Vector Databases: Scaling Beyond Memory

For production applications with millions of documents, you need a vector database that can perform approximate nearest neighbor (ANN) search efficiently:

```python
# Using ChromaDB — a popular open-source vector database
import chromadb

client_db = chromadb.PersistentClient(path="./chroma_data")

# Create a collection with automatic embedding
collection = client_db.get_or_create_collection(
    name="knowledge_base",
    metadata={"hnsw:space": "cosine"},  # Use cosine similarity
)

# Add documents — ChromaDB handles embedding automatically
collection.add(
    documents=[
        "Machine learning models learn patterns from data.",
        "Neural networks are inspired by biological neurons.",
        "Gradient descent optimizes model parameters iteratively.",
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"topic": "ml", "difficulty": "beginner"},
        {"topic": "dl", "difficulty": "intermediate"},
        {"topic": "optimization", "difficulty": "advanced"},
    ],
)

# Query with metadata filtering
results = collection.query(
    query_texts=["How do AI models improve over time?"],
    n_results=2,
    where={"difficulty": {"$ne": "advanced"}},  # Exclude advanced docs
)

print(results["documents"])
print(results["distances"])  # Lower = more similar for cosine
```

## Embedding Models: Choosing the Right One

Different embedding models offer different trade-offs:

```python
# Compare embedding dimensions and performance
embedding_models = {
    "text-embedding-3-small": {"dim": 1536, "cost_per_M": 0.02, "provider": "OpenAI"},
    "text-embedding-3-large": {"dim": 3072, "cost_per_M": 0.13, "provider": "OpenAI"},
    "voyage-3": {"dim": 1024, "cost_per_M": 0.06, "provider": "Voyage AI"},
    "all-MiniLM-L6-v2": {"dim": 384, "cost_per_M": 0.00, "provider": "Local (HuggingFace)"},
}

for model, info in embedding_models.items():
    print(f"{model:30s} | dim={info['dim']:5d} | ${info['cost_per_M']:.2f}/M tokens | {info['provider']}")
```

For local embedding without API calls, use the sentence-transformers library:

```python
from sentence_transformers import SentenceTransformer

# Download and load the model locally — no API key needed
model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "How to deploy a Python application",
    "Steps for shipping a Python app to production",
    "Best pizza places in Chicago",
]

# Generate embeddings locally
embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}")  # (3, 384)

# Compute similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity between first two: {similarity.item():.3f}")
```

## FAQ

### What is the difference between embeddings from an LLM and from an embedding model?

LLMs like GPT-4 produce internal embeddings as part of text generation, but these are not exposed through the API. Embedding models like text-embedding-3-small are specifically trained to produce embeddings optimized for similarity comparison. They are smaller, faster, and cheaper than full LLMs, and their embeddings are better suited for search and retrieval. Use embedding models for search and RAG; use LLMs for text generation.

### How many dimensions should my embeddings have?

It depends on the complexity of your data and your storage budget. 384 dimensions (MiniLM) work well for many applications and are very storage-efficient. 1536 dimensions (text-embedding-3-small) capture more nuance and are the sweet spot for most production use. 3072 dimensions (text-embedding-3-large) offer marginal gains for specialized tasks. OpenAI's text-embedding-3 models support dimension reduction via the `dimensions` parameter, letting you choose your trade-off point.

### Can I update embeddings without re-embedding everything?

No. If you change the embedding model or its version, all existing embeddings become incompatible and must be regenerated. This is because different models map texts to different vector spaces — a vector from model A is meaningless in model B's space. Plan for re-indexing when upgrading embedding models. Some teams maintain version numbers on their vector collections and run parallel indexes during transitions.

---

#Embeddings #VectorSearch #CosineSimilarity #SemanticSearch #RAG #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/embeddings-vector-representations-how-llms-understand-meaning
