CrewAI with Custom LLMs: Using Claude, Ollama, and Azure OpenAI

One Framework, Many Models

CrewAI defaults to OpenAI's GPT-4 for agent reasoning, but production systems often need different models for different agents. A research agent might use a large, capable model for complex reasoning while a formatting agent uses a smaller, faster model to keep costs down. Some organizations require Azure OpenAI for compliance, while others want fully local inference with Ollama.

CrewAI supports all of these scenarios through its LLM configuration system. You can set models at the agent level, meaning different agents in the same crew can use different providers and models.

Using Anthropic Claude

To use Claude with CrewAI, set your API key and configure the agent:

flowchart TD
    GOAL(["Crew goal"])
    MGR["Manager agent<br/>hierarchical process"]
    R1["Researcher agent<br/>role plus backstory"]
    R2["Analyst agent"]
    W1["Writer agent"]
    T1["Task A<br/>research"]
    T2["Task B<br/>analyze"]
    T3["Task C<br/>draft"]
    TOOLS[("Tools<br/>web search, files")]
    OUT(["Crew output"])
    GOAL --> MGR
    MGR --> T1 --> R1 --> TOOLS
    R1 --> T2 --> R2
    R2 --> T3 --> W1 --> OUT
    style MGR fill:#4f46e5,stroke:#4338ca,color:#fff
    style TOOLS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

from crewai import Agent, LLM

claude_llm = LLM(
    model="anthropic/claude-sonnet-4-20250514",
    temperature=0.7,
    max_tokens=4096,
)

analyst = Agent(
    role="Strategic Analyst",
    goal="Provide deep analytical insights on complex business problems",
    backstory="""You are a senior strategy consultant known for nuanced
    analysis that considers multiple perspectives.""",
    llm=claude_llm,
)

CrewAI uses LiteLLM under the hood, so the model string follows LiteLLM's naming convention: provider/model-name. Claude is particularly strong for tasks requiring careful reasoning, long-context analysis, and nuanced writing.

Using Ollama for Local Models

Ollama lets you run open-source models locally with zero API costs and full data privacy. First, install and start Ollama:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1:8b

Then configure your agent to use it:

from crewai import Agent, LLM

local_llm = LLM(
    model="ollama/llama3.1:8b",
    base_url="http://localhost:11434",
    temperature=0.5,
)

researcher = Agent(
    role="Research Assistant",
    goal="Gather and organize information efficiently",
    backstory="Diligent research assistant with strong organizational skills.",
    llm=local_llm,
)

Local models trade capability for privacy and cost. An 8B parameter model handles straightforward tasks like summarization and formatting well. For complex reasoning or tool use, larger models (70B or above) or cloud-hosted models perform significantly better.

Using Azure OpenAI

For enterprise deployments that require Azure's compliance certifications:

export AZURE_API_KEY="your-azure-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com/"
export AZURE_API_VERSION="2024-08-01-preview"

from crewai import Agent, LLM

azure_llm = LLM(
    model="azure/your-deployment-name",
    api_key="your-azure-key",
    base_url="https://your-resource.openai.azure.com/",
    api_version="2024-08-01-preview",
)

compliance_agent = Agent(
    role="Compliance Reviewer",
    goal="Review documents for regulatory compliance",
    backstory="Expert in GDPR, HIPAA, and SOC 2 compliance requirements.",
    llm=azure_llm,
)

Azure deployments use your custom deployment name rather than OpenAI's standard model names. Ensure your deployment has sufficient token-per-minute quota for agent workloads, which typically make many sequential calls.

Mixing Models in a Single Crew

One of CrewAI's strengths is per-agent model assignment. Use powerful models where reasoning quality matters and cheaper models where it does not:

from crewai import Agent, Task, Crew, Process, LLM

# Expensive, high-capability model for complex analysis
claude_llm = LLM(model="anthropic/claude-sonnet-4-20250514", temperature=0.7)

# Cost-effective model for formatting and simple tasks
gpt_mini = LLM(model="openai/gpt-4o-mini", temperature=0.3)

# Local model for data processing (no API cost)
local_llm = LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434")

analyst = Agent(
    role="Senior Analyst",
    goal="Perform deep strategic analysis",
    backstory="Expert analyst requiring nuanced reasoning.",
    llm=claude_llm,
)

data_processor = Agent(
    role="Data Processor",
    goal="Clean and structure raw data",
    backstory="Efficient data processing specialist.",
    llm=local_llm,
)

formatter = Agent(
    role="Report Formatter",
    goal="Format analysis into polished reports",
    backstory="Technical writer focused on presentation.",
    llm=gpt_mini,
)

This architecture optimizes the cost-quality tradeoff. The analyst needs the best reasoning capability. The data processor handles routine work locally. The formatter uses a small, fast model since it is mostly reorganizing existing content.

Model Parameters and Tuning

Fine-tune model behavior with LLM parameters:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

from crewai import LLM

llm = LLM(
    model="openai/gpt-4o",
    temperature=0.2,
    max_tokens=4096,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    seed=42,
)

Key parameters to adjust:

temperature — Lower (0.1-0.3) for analytical tasks, higher (0.7-0.9) for creative tasks. Agent reasoning generally works best at 0.3-0.5.
max_tokens — Set based on expected output length. Too low and outputs get truncated. Too high and you waste money on unused capacity.
top_p — Alternative to temperature for controlling randomness. Usually keep at 0.9-1.0.
seed — Enables deterministic outputs for reproducible testing.

Implementing Fallback Strategies

Production systems need resilience. Implement fallbacks when a primary model is unavailable:

from crewai import Agent, LLM

def create_resilient_agent(role, goal, backstory):
    """Create an agent with fallback LLM configuration."""
    try:
        primary = LLM(model="anthropic/claude-sonnet-4-20250514")
        # Test the connection
        return Agent(role=role, goal=goal, backstory=backstory, llm=primary)
    except Exception:
        fallback = LLM(model="openai/gpt-4o")
        return Agent(role=role, goal=goal, backstory=backstory, llm=fallback)

analyst = create_resilient_agent(
    role="Analyst",
    goal="Analyze market data",
    backstory="Senior market analyst.",
)

For more sophisticated fallback handling, use LiteLLM's built-in router with fallback configurations, which CrewAI supports natively.

FAQ

Which LLM works best with CrewAI?

For most use cases, GPT-4o and Claude Sonnet provide the best balance of reasoning quality, tool use reliability, and cost. GPT-4o has a slight edge in tool calling, while Claude excels at nuanced analysis and longer outputs. For cost-sensitive tasks, GPT-4o-mini performs surprisingly well on straightforward work.

Can I use different models for the manager agent in hierarchical mode?

Yes. The manager agent is a regular Agent instance, so you can assign it any LLM. Use a stronger model for the manager since it handles the complex task of delegation, quality assessment, and coordination. Worker agents can use lighter models.

How do I handle rate limits when using multiple agents?

Set max_rpm (maximum requests per minute) on each agent to stay within your provider's rate limits. For example, max_rpm=10 limits the agent to 10 LLM calls per minute. Distribute your rate budget based on task complexity — give analytical agents more headroom and formatting agents less.

#CrewAI #LLMConfiguration #Claude #Ollama #AzureOpenAI #AgenticAI #LearnAI #AIEngineering

CrewAI with Custom LLMs: Using Claude, Ollama, and Azure OpenAI

One Framework, Many Models

Using Anthropic Claude

Using Ollama for Local Models

Using Azure OpenAI

Mixing Models in a Single Crew

Model Parameters and Tuning

Implementing Fallback Strategies

FAQ

Which LLM works best with CrewAI?

Can I use different models for the manager agent in hierarchical mode?

How do I handle rate limits when using multiple agents?

Try CallSphere AI Voice Agents

Related Articles You May Like

How to Use Multiple Chat AIs at Once (and Why You Might)

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic and Moody's Data Partnership: Why Grounding Matters in Finance

Anthropic Microsoft 365 Integration: What Changes for Office Knowledge Workers

Raleigh Startups Building on the Claude Agent SDK

Building Customer Support Pipelines on Claude Sonnet 4.6