Skip to content
Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain
Learn Agentic AI11 min read22 views

Custom Model Providers with OpenAI Agents SDK: Using Any LLM as Your Agent Brain

Learn how to implement the Model protocol in OpenAI Agents SDK to connect any LLM — Anthropic Claude, local Ollama models, or custom endpoints — as your agent's reasoning engine with full tool-calling support.

Why Custom Model Providers Matter

The OpenAI Agents SDK ships with built-in support for OpenAI models, but production teams rarely use a single LLM vendor. You might need Claude for nuanced reasoning, a local Llama model for cost-sensitive tasks, or a fine-tuned endpoint for domain-specific work. The SDK's Model protocol lets you swap in any LLM without changing your agent logic.

This decoupling is the key architectural insight: your agent's behavior (instructions, tools, handoffs) stays the same regardless of which model powers the reasoning.

Understanding the Model Protocol

The SDK defines a Model protocol that any custom provider must implement. At its core, you need to provide a single method — get_response — that accepts the agent's conversation history and returns a structured response.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from __future__ import annotations
from agents import Agent, Runner, Model, ModelProvider
from agents.models import ModelResponse, ModelUsage
from agents.items import (
    TResponseInputItem,
    TResponseOutputItem,
    ModelResponse,
)
from dataclasses import dataclass
from typing import Any
import anthropic

@dataclass
class AnthropicModelResponse:
    output: list[TResponseOutputItem]
    usage: ModelUsage

class AnthropicModel(Model):
    """Custom model that routes agent calls to Anthropic Claude."""

    def __init__(self, model_name: str = "claude-sonnet-4-20250514"):
        self.model_name = model_name
        self.client = anthropic.AsyncAnthropic()

    async def get_response(
        self,
        system_instructions: str | None,
        input: list[TResponseInputItem],
        model_settings: Any,
        tools: list,
        output_schema: Any | None,
        handoffs: list,
        tracing: Any,
    ) -> ModelResponse:
        # Convert SDK messages to Anthropic format
        messages = self._convert_messages(input)

        response = await self.client.messages.create(
            model=self.model_name,
            max_tokens=model_settings.max_tokens or 4096,
            system=system_instructions or "",
            messages=messages,
            temperature=model_settings.temperature or 0.7,
        )

        return self._convert_response(response)

    def _convert_messages(self, input_items):
        """Transform SDK input items to Anthropic message format."""
        messages = []
        for item in input_items:
            if hasattr(item, "role") and hasattr(item, "content"):
                messages.append({
                    "role": item.role if item.role != "system" else "user",
                    "content": item.content,
                })
        return messages if messages else [{"role": "user", "content": "Hello"}]

    def _convert_response(self, response):
        """Transform Anthropic response back to SDK format."""
        # Build output items from response content blocks
        output_text = ""
        for block in response.content:
            if block.type == "text":
                output_text += block.text

        return ModelResponse(
            output=[],  # Simplified — populate with proper items
            usage=ModelUsage(
                input_tokens=response.usage.input_tokens,
                output_tokens=response.usage.output_tokens,
                requests=1,
            ),
            response_id=response.id,
        )

Building a Custom Model Provider

A ModelProvider maps model name strings to Model instances. This lets you register multiple backends under a single provider.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
class MultiModelProvider(ModelProvider):
    """Routes model names to different LLM backends."""

    def __init__(self):
        self._models: dict[str, Model] = {}

    def register(self, name: str, model: Model):
        self._models[name] = model

    def get_model(self, model_name: str | None) -> Model:
        if model_name and model_name in self._models:
            return self._models[model_name]
        raise ValueError(f"Unknown model: {model_name}")

# Register providers
provider = MultiModelProvider()
provider.register("claude-sonnet", AnthropicModel("claude-sonnet-4-20250514"))
provider.register("claude-haiku", AnthropicModel("claude-haiku-4-20250514"))

Connecting a Local Ollama Model

For local inference, you can implement a provider that calls Ollama's HTTP API.

import httpx

class OllamaModel(Model):
    def __init__(self, model_name: str = "llama3", base_url: str = "http://localhost:11434"):
        self.model_name = model_name
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=120.0)

    async def get_response(self, system_instructions, input, model_settings, tools, output_schema, handoffs, tracing):
        messages = []
        if system_instructions:
            messages.append({"role": "system", "content": system_instructions})
        for item in input:
            if hasattr(item, "role"):
                messages.append({"role": item.role, "content": item.content})

        resp = await self.client.post(
            f"{self.base_url}/api/chat",
            json={"model": self.model_name, "messages": messages, "stream": False},
        )
        data = resp.json()
        return self._build_response(data)

Wiring It Into Your Agent

Once your provider is ready, pass it when creating an agent.

import asyncio

agent = Agent(
    name="research_assistant",
    instructions="You are a helpful research assistant.",
    model="claude-sonnet",  # This name is resolved by the provider
)

async def main():
    result = await Runner.run(
        agent,
        input="Summarize the latest advances in quantum computing.",
        run_config={"model_provider": provider},
    )
    print(result.final_output)

asyncio.run(main())

The agent code has zero awareness of which vendor is running under the hood. Switching from Claude to a local Llama model is a one-line configuration change.

When to Use Custom Providers

Custom model providers solve real production problems: cost optimization by routing simple tasks to cheaper models, compliance by keeping sensitive data on local models, redundancy by failing over between vendors, and specialization by directing domain tasks to fine-tuned endpoints.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

FAQ

Can I use tool calling with custom model providers?

Yes, but your custom Model implementation must convert the SDK's tool definitions into whatever format your target LLM expects. For Anthropic, this means transforming the JSON schema into Claude's tool format. For local models without native tool calling, you can inject tool descriptions into the system prompt and parse the output yourself.

Does streaming work with custom providers?

The SDK supports a get_stream_response method alongside get_response. Implement this method to return an async iterator of chunks. If you skip it, the SDK falls back to the non-streaming path, which still works but returns the full response at once.

How do I handle authentication for multiple providers?

Each Model instance manages its own authentication. Store API keys in environment variables and read them in each model's constructor. Avoid passing keys through the agent layer — the model provider encapsulates all vendor-specific details.


#OpenAIAgentsSDK #CustomModelProvider #LLMIntegration #Anthropic #Ollama #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

Comparisons

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

A three-way comparison of Gemini Enterprise, Anthropic managed agents and OpenAI Frontier Platform after Cloud Next 2026 — strengths, gaps, buyer fit.

Business

Anthropic's Financial Services Platform: State of Play in May 2026

Anthropic's May 2026 push positions Claude as a vertical platform for financial services. The strategic positioning versus OpenAI and Google.

Comparisons

Project Arc vs Anthropic Managed Agents: Enterprise Agent Comparison

ServiceNow Project Arc vs Anthropic Managed Agents — runtime, governance, integration, and use cases. The 2026 enterprise autonomous agent comparison.

AI Engineering

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

May 2026's biggest agent-architecture shift: planning, tool selection, and self-correction move inside the model. Framework code shrinks. Here is what changes.

AI Engineering

Anthropic and Moody's Data Partnership: Why Grounding Matters in Finance

Anthropic and Moody's announced a data partnership in May 2026 that grounds Claude in audited financial reference data. Why grounding reduces hallucination and what it unlocks.