Skip to content
Learn Agentic AI
Learn Agentic AI11 min read2 views

Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain

Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support.

Why Custom Model Providers Matter

The OpenAI Agents SDK ships with built-in support for OpenAI models, but production teams rarely use a single LLM vendor. You might need Claude for nuanced reasoning, a local Llama model for cost-sensitive tasks, or a fine-tuned endpoint for domain-specific work. The SDK's Model protocol lets you swap in any LLM without changing your agent logic.

This decoupling is the key architectural insight: your agent's behavior (instructions, tools, handoffs) stays the same regardless of which model powers the reasoning.

Understanding the Model Protocol

The SDK defines a Model protocol that any custom provider must implement. At its core, you need to provide a single method — get_response — that accepts the agent's conversation history and returns a structured response.

flowchart TD
    START["Custom Model Providers with OpenAI Agents SDK: Us…"] --> A
    A["Why Custom Model Providers Matter"]
    A --> B
    B["Understanding the Model Protocol"]
    B --> C
    C["Building a Custom Model Provider"]
    C --> D
    D["Connecting a Local Ollama Model"]
    D --> E
    E["Wiring It Into Your Agent"]
    E --> F
    F["When to Use Custom Providers"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from __future__ import annotations
from agents import Agent, Runner, Model, ModelProvider
from agents.models import ModelResponse, ModelUsage
from agents.items import (
    TResponseInputItem,
    TResponseOutputItem,
    ModelResponse,
)
from dataclasses import dataclass
from typing import Any
import anthropic


@dataclass
class AnthropicModelResponse:
    output: list[TResponseOutputItem]
    usage: ModelUsage


class AnthropicModel(Model):
    """Custom model that routes agent calls to Anthropic Claude."""

    def __init__(self, model_name: str = "claude-sonnet-4-20250514"):
        self.model_name = model_name
        self.client = anthropic.AsyncAnthropic()

    async def get_response(
        self,
        system_instructions: str | None,
        input: list[TResponseInputItem],
        model_settings: Any,
        tools: list,
        output_schema: Any | None,
        handoffs: list,
        tracing: Any,
    ) -> ModelResponse:
        # Convert SDK messages to Anthropic format
        messages = self._convert_messages(input)

        response = await self.client.messages.create(
            model=self.model_name,
            max_tokens=model_settings.max_tokens or 4096,
            system=system_instructions or "",
            messages=messages,
            temperature=model_settings.temperature or 0.7,
        )

        return self._convert_response(response)

    def _convert_messages(self, input_items):
        """Transform SDK input items to Anthropic message format."""
        messages = []
        for item in input_items:
            if hasattr(item, "role") and hasattr(item, "content"):
                messages.append({
                    "role": item.role if item.role != "system" else "user",
                    "content": item.content,
                })
        return messages if messages else [{"role": "user", "content": "Hello"}]

    def _convert_response(self, response):
        """Transform Anthropic response back to SDK format."""
        # Build output items from response content blocks
        output_text = ""
        for block in response.content:
            if block.type == "text":
                output_text += block.text

        return ModelResponse(
            output=[],  # Simplified — populate with proper items
            usage=ModelUsage(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
                requests=1,
            ),
            response_id=response.id,
        )

Building a Custom Model Provider

A ModelProvider maps model name strings to Model instances. This lets you register multiple backends under a single provider.

class MultiModelProvider(ModelProvider):
    """Routes model names to different LLM backends."""

    def __init__(self):
        self._models: dict[str, Model] = {}

    def register(self, name: str, model: Model):
        self._models[name] = model

    def get_model(self, model_name: str | None) -> Model:
        if model_name and model_name in self._models:
            return self._models[model_name]
        raise ValueError(f"Unknown model: {model_name}")


# Register providers
provider = MultiModelProvider()
provider.register("claude-sonnet", AnthropicModel("claude-sonnet-4-20250514"))
provider.register("claude-haiku", AnthropicModel("claude-haiku-4-20250514"))

Connecting a Local Ollama Model

For local inference, you can implement a provider that calls Ollama's HTTP API.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import httpx

class OllamaModel(Model):
    def __init__(self, model_name: str = "llama3", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=120.0)

    async def get_response(self, system_instructions, input, model_settings, tools, output_schema, handoffs, tracing):
        messages = []
        if system_instructions:
            messages.append({"role": "system", "content": system_instructions})
        for item in input:
            if hasattr(item, "role"):
                messages.append({"role": item.role, "content": item.content})

        resp = await self.client.post(
            f"{self.base_url}/api/chat",
            json={"model": self.model_name, "messages": messages, "stream": False},
        )
        data = resp.json()
        return self._build_response(data)

Wiring It Into Your Agent

Once your provider is ready, pass it when creating an agent.

import asyncio

agent = Agent(
    name="research_assistant",
    instructions="You are a helpful research assistant.",
    model="claude-sonnet",  # This name is resolved by the provider
)

async def main():
    result = await Runner.run(
        agent,
        input="Summarize the latest advances in quantum computing.",
        run_config={"model_provider": provider},
    )
    print(result.final_output)

asyncio.run(main())

The agent code has zero awareness of which vendor is running under the hood. Switching from Claude to a local Llama model is a one-line configuration change.

When to Use Custom Providers

Custom model providers solve real production problems: cost optimization by routing simple tasks to cheaper models, compliance by keeping sensitive data on local models, redundancy by failing over between vendors, and specialization by directing domain tasks to fine-tuned endpoints.

FAQ

Can I use tool calling with custom model providers?

Yes, but your custom Model implementation must convert the SDK's tool definitions into whatever format your target LLM expects. For Anthropic, this means transforming the JSON schema into Claude's tool format. For local models without native tool calling, you can inject tool descriptions into the system prompt and parse the output yourself.

Does streaming work with custom providers?

The SDK supports a get_stream_response method alongside get_response. Implement this method to return an async iterator of chunks. If you skip it, the SDK falls back to the non-streaming path, which still works but returns the full response at once.

How do I handle authentication for multiple providers?

Each Model instance manages its own authentication. Store API keys in environment variables and read them in each model's constructor. Avoid passing keys through the agent layer — the model provider encapsulates all vendor-specific details.


#OpenAIAgentsSDK #CustomModelProvider #LLMIntegration #Anthropic #Ollama #Python #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Technical Guides

Building Multi-Agent Voice Systems with the OpenAI Agents SDK

A developer guide to building multi-agent voice systems with the OpenAI Agents SDK — triage, handoffs, shared state, and tool calling.

AI Interview Prep

8 AI System Design Interview Questions Actually Asked at FAANG in 2026

Real AI system design interview questions from Google, Meta, OpenAI, and Anthropic. Covers LLM serving, RAG pipelines, recommendation systems, AI agents, and more — with detailed answer frameworks.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

AI Agent Framework Comparison 2026: LangGraph vs CrewAI vs AutoGen vs OpenAI Agents SDK

Side-by-side comparison of the top 4 AI agent frameworks: LangGraph, CrewAI, AutoGen, and OpenAI Agents SDK — architecture, features, production readiness, and when to choose each.

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.