LiteLLM Integration: Using Non-OpenAI Models with Agents SDK
Integrate Anthropic, Google, Mistral, and other LLM providers into OpenAI's Agents SDK using LiteLLM's unified interface with LitellmModel, provider prefix notation, and cross-provider tracing.
Why Use Non-OpenAI Models with the Agents SDK
The OpenAI Agents SDK provides an excellent framework for building multi-agent systems — structured outputs, handoffs, guardrails, and tracing. But sometimes you need a different model provider. Maybe your contract requires using Anthropic for certain workloads. Maybe a Mistral model outperforms on a specific language task. Maybe you want redundancy across providers for reliability.
LiteLLM provides a unified interface to 100+ LLM providers using the OpenAI API format. The Agents SDK's LitellmModel adapter lets you plug any LiteLLM-supported model into your agents while keeping the full SDK feature set.
Installing LiteLLM
LiteLLM is an optional extension of the Agents SDK:
flowchart TD
START["LiteLLM Integration: Using Non-OpenAI Models with…"] --> A
A["Why Use Non-OpenAI Models with the Agen…"]
A --> B
B["Installing LiteLLM"]
B --> C
C["Basic LiteLLM Usage"]
C --> D
D["Setting Up API Keys"]
D --> E
E["Mixing Providers in a Multi-Agent Workf…"]
E --> F
F["Tool Calling Across Providers"]
F --> G
G["Tracing Across Providers"]
G --> H
H["Cost Comparison Across Providers"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
pip install "openai-agents[litellm]"
This installs the LitellmModel adapter alongside the base SDK.
Basic LiteLLM Usage
The simplest way to use a non-OpenAI model is with the LitellmModel class:
from agents import Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
import asyncio
# Use Anthropic's Claude
claude_agent = Agent(
name="ClaudeAgent",
model=LitellmModel(model="anthropic/claude-sonnet-4-20250514"),
instructions="You are a helpful research assistant.",
)
# Use Google's Gemini
gemini_agent = Agent(
name="GeminiAgent",
model=LitellmModel(model="gemini/gemini-2.5-pro"),
instructions="You are a creative writing assistant.",
)
# Use Mistral
mistral_agent = Agent(
name="MistralAgent",
model=LitellmModel(model="mistral/mistral-large-latest"),
instructions="You are a code review assistant.",
)
async def main():
result = await Runner.run(claude_agent, input="Summarize recent advances in robotics.")
print(result.final_output)
asyncio.run(main())
The provider prefix notation (anthropic/, gemini/, mistral/) tells LiteLLM which provider to route to. LiteLLM handles the API translation automatically.
Setting Up API Keys
Each provider needs its own API key. Set them as environment variables:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# OpenAI (for native SDK agents)
export OPENAI_API_KEY="sk-..."
# Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
# Google
export GEMINI_API_KEY="..."
# Mistral
export MISTRAL_API_KEY="..."
# Azure OpenAI
export AZURE_API_KEY="..."
export AZURE_API_BASE="https://your-resource.openai.azure.com/"
export AZURE_API_VERSION="2024-12-01-preview"
LiteLLM reads these environment variables automatically — no additional configuration needed.
Mixing Providers in a Multi-Agent Workflow
The real power comes from mixing providers. Use each model where it excels:
from agents import Agent, Runner, handoff
from agents.extensions.models.litellm_model import LitellmModel
# Triage with a fast, cheap OpenAI model (native SDK)
triage_agent = Agent(
name="TriageAgent",
model="gpt-4.1-mini",
instructions=(
"Classify the request as: research, creative, code, or general. "
"Hand off to the appropriate specialist."
),
)
# Research with Claude (strong at analysis and long documents)
research_agent = Agent(
name="ResearchAgent",
model=LitellmModel(model="anthropic/claude-sonnet-4-20250514"),
instructions=(
"Conduct thorough research on the given topic. "
"Cite sources and provide balanced analysis."
),
)
# Creative writing with Gemini (strong generative capabilities)
creative_agent = Agent(
name="CreativeAgent",
model=LitellmModel(model="gemini/gemini-2.5-pro"),
instructions="Write creative, engaging content based on the brief.",
)
# Code review with GPT-4.1 (best tool-calling for code tools)
code_agent = Agent(
name="CodeAgent",
model="gpt-4.1",
instructions="Review code, identify issues, and suggest improvements.",
tools=[run_linter, run_tests, search_codebase],
)
# Wire up handoffs
triage_agent.handoffs = [research_agent, creative_agent, code_agent]
async def main():
result = await Runner.run(
triage_agent,
input="Research the latest developments in rust async runtimes.",
)
print(result.final_output)
The triage agent runs on GPT-4.1-mini for speed and cost. Research goes to Claude. Creative tasks go to Gemini. Code analysis stays on GPT-4.1 because it has the best tool-calling reliability.
Tool Calling Across Providers
One important consideration: tool calling support varies by provider. LiteLLM translates the OpenAI tool format to each provider's native format, but some providers handle complex tool schemas better than others.
from agents import Agent, function_tool
from agents.extensions.models.litellm_model import LitellmModel
@function_tool
def search_database(query: str, limit: int = 10) -> str:
"""Search the product database."""
# Implementation here
return f"Found {limit} results for: {query}"
@function_tool
def get_user_profile(user_id: str) -> str:
"""Retrieve a user profile by ID."""
return f"Profile for user {user_id}: Premium tier, joined 2024"
# Claude handles tool calling well
claude_with_tools = Agent(
name="ClaudeToolAgent",
model=LitellmModel(model="anthropic/claude-sonnet-4-20250514"),
instructions="Help users find products and manage their accounts.",
tools=[search_database, get_user_profile],
)
If you encounter tool calling issues with a specific provider, you can implement a fallback pattern:
from agents import Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
from agents.exceptions import AgentsException
async def run_with_fallback(agent_input: str, tools: list):
"""Try the primary provider, fall back to OpenAI if tool calling fails."""
primary = Agent(
name="PrimaryAgent",
model=LitellmModel(model="anthropic/claude-sonnet-4-20250514"),
instructions="Process the request using available tools.",
tools=tools,
)
fallback = Agent(
name="FallbackAgent",
model="gpt-4.1",
instructions="Process the request using available tools.",
tools=tools,
)
try:
result = await Runner.run(primary, input=agent_input)
return result.final_output
except AgentsException:
result = await Runner.run(fallback, input=agent_input)
return result.final_output
Tracing Across Providers
Tracing works seamlessly across providers. The Agents SDK trace captures spans regardless of which model backend is used:
from agents import Agent, Runner, trace
from agents.extensions.models.litellm_model import LitellmModel
async def multi_provider_workflow(query: str):
with trace(workflow_name="multi-provider-research"):
# Step 1: Classify with GPT-4.1-mini
classifier = Agent(
name="Classifier",
model="gpt-4.1-mini",
instructions="Classify the query topic into one category.",
)
classification = await Runner.run(classifier, input=query)
# Step 2: Research with Claude
researcher = Agent(
name="Researcher",
model=LitellmModel(model="anthropic/claude-sonnet-4-20250514"),
instructions="Research the topic thoroughly.",
)
research = await Runner.run(researcher, input=query)
# Step 3: Synthesize with GPT-5
synthesizer = Agent(
name="Synthesizer",
model="gpt-5",
instructions="Synthesize the research into a clear summary.",
)
result = await Runner.run(
synthesizer,
input=f"Research findings: {research.final_output}",
)
return result.final_output
The trace in the OpenAI dashboard shows all three agent spans with their respective models, token usage, and latency — giving you a complete picture of the cross-provider workflow.
Cost Comparison Across Providers
Track costs across providers to optimize your model mix:
PROVIDER_PRICING = {
"gpt-4.1": {"input": 2.00, "output": 8.00},
"gpt-4.1-mini": {"input": 0.40, "output": 1.60},
"anthropic/claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"gemini/gemini-2.5-pro": {"input": 1.25, "output": 10.00},
"mistral/mistral-large-latest": {"input": 2.00, "output": 6.00},
}
def estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
pricing = PROVIDER_PRICING.get(model, {"input": 5.0, "output": 15.0})
return (
(input_tokens / 1_000_000) * pricing["input"] +
(output_tokens / 1_000_000) * pricing["output"]
)
LiteLLM integration transforms the Agents SDK from an OpenAI-only framework into a truly provider-agnostic agent platform. Use it to leverage each provider's strengths, build redundancy into your systems, and optimize costs by routing to the most cost-effective model for each task. The key is to measure — run evaluations across providers for your specific use cases and let the data drive your model selection.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.