---
title: "Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain"
description: "Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support."
canonical: https://callsphere.ai/blog/custom-model-providers-openai-agents-sdk-any-llm-agent-brain
category: "Learn Agentic AI"
tags: ["OpenAI Agents SDK", "Custom Model Provider", "LLM Integration", "Anthropic", "Ollama", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T19:51:09.548Z
---

# Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain

> Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support.

## Why Custom Model Providers Matter

The OpenAI Agents SDK ships with built-in support for OpenAI models, but production teams rarely use a single LLM vendor. You might need Claude for nuanced reasoning, a local Llama model for cost-sensitive tasks, or a fine-tuned endpoint for domain-specific work. The SDK's Model protocol lets you swap in any LLM without changing your agent logic.

This decoupling is the key architectural insight: your agent's behavior (instructions, tools, handoffs) stays the same regardless of which model powers the reasoning.

## Understanding the Model Protocol

The SDK defines a `Model` protocol that any custom provider must implement. At its core, you need to provide a single method — `get_response` — that accepts the agent's conversation history and returns a structured response.

```mermaid
flowchart LR
    INPUT(["User input"])
    AGENT["Agent
name plus instructions"]
    HAND{"Handoff to
another agent?"}
    SUB["Sub-agent
specialist"]
    GUARD{"Guardrail
passed?"}
    TOOL["Tool call"]
    SDK[("Tracing
OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from __future__ import annotations
from agents import Agent, Runner, Model, ModelProvider
from agents.models import ModelResponse, ModelUsage
from agents.items import (
    TResponseInputItem,
    TResponseOutputItem,
    ModelResponse,
)
from dataclasses import dataclass
from typing import Any
import anthropic

@dataclass
class AnthropicModelResponse:
    output: list[TResponseOutputItem]
    usage: ModelUsage

class AnthropicModel(Model):
    """Custom model that routes agent calls to Anthropic Claude."""

    def __init__(self, model_name: str = "claude-sonnet-4-20250514"):
        self.model_name = model_name
        self.client = anthropic.AsyncAnthropic()

    async def get_response(
        self,
        system_instructions: str | None,
        input: list[TResponseInputItem],
        model_settings: Any,
        tools: list,
        output_schema: Any | None,
        handoffs: list,
        tracing: Any,
    ) -> ModelResponse:
        # Convert SDK messages to Anthropic format
        messages = self._convert_messages(input)

        response = await self.client.messages.create(
            model=self.model_name,
            max_tokens=model_settings.max_tokens or 4096,
            system=system_instructions or "",
            messages=messages,
            temperature=model_settings.temperature or 0.7,
        )

        return self._convert_response(response)

    def _convert_messages(self, input_items):
        """Transform SDK input items to Anthropic message format."""
        messages = []
        for item in input_items:
            if hasattr(item, "role") and hasattr(item, "content"):
                messages.append({
                    "role": item.role if item.role != "system" else "user",
                    "content": item.content,
                })
        return messages if messages else [{"role": "user", "content": "Hello"}]

    def _convert_response(self, response):
        """Transform Anthropic response back to SDK format."""
        # Build output items from response content blocks
        output_text = ""
        for block in response.content:
            if block.type == "text":
                output_text += block.text

        return ModelResponse(
            output=[],  # Simplified — populate with proper items
            usage=ModelUsage(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
                requests=1,
            ),
            response_id=response.id,
        )
```

## Building a Custom Model Provider

A `ModelProvider` maps model name strings to `Model` instances. This lets you register multiple backends under a single provider.

```python
class MultiModelProvider(ModelProvider):
    """Routes model names to different LLM backends."""

    def __init__(self):
        self._models: dict[str, Model] = {}

    def register(self, name: str, model: Model):
        self._models[name] = model

    def get_model(self, model_name: str | None) -> Model:
        if model_name and model_name in self._models:
            return self._models[model_name]
        raise ValueError(f"Unknown model: {model_name}")

# Register providers
provider = MultiModelProvider()
provider.register("claude-sonnet", AnthropicModel("claude-sonnet-4-20250514"))
provider.register("claude-haiku", AnthropicModel("claude-haiku-4-20250514"))
```

## Connecting a Local Ollama Model

For local inference, you can implement a provider that calls Ollama's HTTP API.

```python
import httpx

class OllamaModel(Model):
    def __init__(self, model_name: str = "llama3", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=120.0)

    async def get_response(self, system_instructions, input, model_settings, tools, output_schema, handoffs, tracing):
        messages = []
        if system_instructions:
            messages.append({"role": "system", "content": system_instructions})
        for item in input:
            if hasattr(item, "role"):
                messages.append({"role": item.role, "content": item.content})

        resp = await self.client.post(
            f"{self.base_url}/api/chat",
            json={"model": self.model_name, "messages": messages, "stream": False},
        )
        data = resp.json()
        return self._build_response(data)
```

## Wiring It Into Your Agent

Once your provider is ready, pass it when creating an agent.

```python
import asyncio

agent = Agent(
    name="research_assistant",
    instructions="You are a helpful research assistant.",
    model="claude-sonnet",  # This name is resolved by the provider
)

async def main():
    result = await Runner.run(
        agent,
        input="Summarize the latest advances in quantum computing.",
        run_config={"model_provider": provider},
    )
    print(result.final_output)

asyncio.run(main())
```

The agent code has zero awareness of which vendor is running under the hood. Switching from Claude to a local Llama model is a one-line configuration change.

## When to Use Custom Providers

Custom model providers solve real production problems: **cost optimization** by routing simple tasks to cheaper models, **compliance** by keeping sensitive data on local models, **redundancy** by failing over between vendors, and **specialization** by directing domain tasks to fine-tuned endpoints.

## FAQ

### Can I use tool calling with custom model providers?

Yes, but your custom `Model` implementation must convert the SDK's tool definitions into whatever format your target LLM expects. For Anthropic, this means transforming the JSON schema into Claude's tool format. For local models without native tool calling, you can inject tool descriptions into the system prompt and parse the output yourself.

### Does streaming work with custom providers?

The SDK supports a `get_stream_response` method alongside `get_response`. Implement this method to return an async iterator of chunks. If you skip it, the SDK falls back to the non-streaming path, which still works but returns the full response at once.

### How do I handle authentication for multiple providers?

Each `Model` instance manages its own authentication. Store API keys in environment variables and read them in each model's constructor. Avoid passing keys through the agent layer — the model provider encapsulates all vendor-specific details.

---

#OpenAIAgentsSDK #CustomModelProvider #LLMIntegration #Anthropic #Ollama #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/custom-model-providers-openai-agents-sdk-any-llm-agent-brain
