---
title: "Weaviate Tutorial: GraphQL-Powered Vector Search with Built-In Modules"
description: "Learn to set up Weaviate, design schemas with vectorizer modules, import data, and run hybrid keyword-plus-vector searches using Weaviate's GraphQL API and Python client."
canonical: https://callsphere.ai/blog/weaviate-tutorial-graphql-vector-search-modules
category: "Learn Agentic AI"
tags: ["Weaviate", "Vector Database", "GraphQL", "Hybrid Search", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T08:05:17.180Z
---

# Weaviate Tutorial: GraphQL-Powered Vector Search with Built-In Modules

> Learn to set up Weaviate, design schemas with vectorizer modules, import data, and run hybrid keyword-plus-vector searches using Weaviate's GraphQL API and Python client.

## What Makes Weaviate Different

Weaviate is an open-source vector database with two distinctive features: a GraphQL API for flexible querying and a modular architecture that plugs in embedding models, rerankers, and generative AI directly at the database level. Instead of embedding documents in your application code and sending vectors to the database, Weaviate can handle vectorization internally using modules like `text2vec-openai` or `text2vec-cohere`.

This module-based approach simplifies your application code. You send raw text to Weaviate, and it generates, stores, and indexes the embeddings automatically. Combined with hybrid search (keyword BM25 + vector similarity), Weaviate is a strong choice for applications that need both traditional and semantic search.

## Setting Up Weaviate with Docker

The fastest way to run Weaviate locally is with Docker Compose. Create a `docker-compose.yml`:

```mermaid
flowchart TD
    DOC(["Document"])
    CHUNK["Chunker
recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata
source, page, tenant"]
    INDEX[("HNSW or IVF index
in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search
cosine similarity"]
    FILTER["Metadata filter
tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
```

```yaml
version: '3.4'
services:
  weaviate:
    image: cr.weaviate.io/semitechnologies/weaviate:1.28.0
    ports:
      - "8080:8080"
      - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      DEFAULT_VECTORIZER_MODULE: "text2vec-openai"
      ENABLE_MODULES: "text2vec-openai,generative-openai"
      OPENAI_APIKEY: "sk-your-key-here"
      CLUSTER_HOSTNAME: "node1"
```

Start the server:

```bash
docker compose up -d
```

Install the Python client:

```bash
pip install weaviate-client
```

## Connecting to Weaviate

```python
import weaviate
from weaviate.classes.init import Auth

# Local instance
client = weaviate.connect_to_local()

# Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url="https://your-cluster.weaviate.network",
    auth_credentials=Auth.api_key("your-weaviate-api-key"),
    headers={"X-OpenAI-Api-Key": "sk-..."}
)

print(client.is_ready())
```

## Designing a Schema

In Weaviate, a collection (formerly called a "class") defines the structure of your data. Each collection has properties and a vectorizer configuration:

```python
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    name="Article",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="content", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT),
        Property(name="word_count", data_type=DataType.INT),
    ]
)
```

Weaviate will automatically vectorize the `TEXT` properties when you insert data. You can skip vectorization for specific properties by setting `skip_vectorization=True`.

## Importing Data

Insert objects and Weaviate generates embeddings automatically:

```python
articles = client.collections.get("Article")

articles.data.insert({
    "title": "Introduction to Vector Databases",
    "content": "Vector databases store and search high-dimensional embeddings...",
    "category": "databases",
    "word_count": 450
})

# Batch import for large datasets
with articles.batch.dynamic() as batch:
    for doc in documents:
        batch.add_object(properties={
            "title": doc["title"],
            "content": doc["content"],
            "category": doc["category"],
            "word_count": doc["word_count"]
        })
```

## Vector Search (nearText)

Search by semantic meaning without computing embeddings yourself:

```python
from weaviate.classes.query import MetadataQuery

articles = client.collections.get("Article")

response = articles.query.near_text(
    query="how vector similarity search works",
    limit=5,
    return_metadata=MetadataQuery(distance=True)
)

for obj in response.objects:
    print(f"{obj.properties['title']} (distance: {obj.metadata.distance:.4f})")
```

## Hybrid Search

Combine BM25 keyword search with vector similarity for the best of both worlds:

```python
response = articles.query.hybrid(
    query="PostgreSQL vector extension performance",
    limit=5,
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector
    return_metadata=MetadataQuery(score=True)
)

for obj in response.objects:
    print(f"{obj.properties['title']} (score: {obj.metadata.score:.4f})")
```

The `alpha` parameter controls the balance. Start at 0.5 and adjust based on your use case — content with specialized terminology often benefits from a lower alpha that weights keyword matching more heavily.

## Filtering Results

Apply filters alongside vector or hybrid search:

```python
from weaviate.classes.query import Filter

response = articles.query.near_text(
    query="database performance",
    limit=10,
    filters=Filter.by_property("category").equal("databases") &
            Filter.by_property("word_count").greater_than(200)
)
```

## FAQ

### Do I need to generate embeddings in my application code when using Weaviate?

No. Weaviate's vectorizer modules handle embedding generation at the database level. You send raw text, and the configured module (like `text2vec-openai`) generates and stores the embedding. You only need to manage embeddings yourself if you use the `none` vectorizer and provide your own vectors.

### What is the difference between nearText and nearVector queries?

`nearText` sends your query string to the vectorizer module, which generates an embedding and then searches. `nearVector` accepts a pre-computed vector directly. Use `nearText` for simplicity; use `nearVector` when you embed queries externally or want to reuse embeddings across multiple searches.

### Can Weaviate run without any cloud API dependencies?

Yes. Use the `text2vec-transformers` module instead of `text2vec-openai`. This runs a transformer model locally inside a Docker container alongside Weaviate. It is slower and uses more memory but requires no external API calls or keys.

---

#Weaviate #VectorDatabase #GraphQL #HybridSearch #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/weaviate-tutorial-graphql-vector-search-modules
