Skip to content
Learn Agentic AI
Learn Agentic AI12 min read2 views

CrewAI with Custom LLMs: Using Claude, Ollama, and Azure OpenAI

Configure CrewAI agents to use different LLM providers including Anthropic Claude, local Ollama models, and Azure OpenAI, with model parameter tuning and fallback strategies.

One Framework, Many Models

CrewAI defaults to OpenAI's GPT-4 for agent reasoning, but production systems often need different models for different agents. A research agent might use a large, capable model for complex reasoning while a formatting agent uses a smaller, faster model to keep costs down. Some organizations require Azure OpenAI for compliance, while others want fully local inference with Ollama.

CrewAI supports all of these scenarios through its LLM configuration system. You can set models at the agent level, meaning different agents in the same crew can use different providers and models.

Using Anthropic Claude

To use Claude with CrewAI, set your API key and configure the agent:

flowchart TD
    START["CrewAI with Custom LLMs: Using Claude, Ollama, an…"] --> A
    A["One Framework, Many Models"]
    A --> B
    B["Using Anthropic Claude"]
    B --> C
    C["Using Ollama for Local Models"]
    C --> D
    D["Using Azure OpenAI"]
    D --> E
    E["Mixing Models in a Single Crew"]
    E --> F
    F["Model Parameters and Tuning"]
    F --> G
    G["Implementing Fallback Strategies"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
from crewai import Agent, LLM

claude_llm = LLM(
    model="anthropic/claude-sonnet-4-20250514",
    temperature=0.7,
    max_tokens=4096,
)

analyst = Agent(
    role="Strategic Analyst",
    goal="Provide deep analytical insights on complex business problems",
    backstory="""You are a senior strategy consultant known for nuanced
    analysis that considers multiple perspectives.""",
    llm=claude_llm,
)

CrewAI uses LiteLLM under the hood, so the model string follows LiteLLM's naming convention: provider/model-name. Claude is particularly strong for tasks requiring careful reasoning, long-context analysis, and nuanced writing.

Using Ollama for Local Models

Ollama lets you run open-source models locally with zero API costs and full data privacy. First, install and start Ollama:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1:8b

Then configure your agent to use it:

from crewai import Agent, LLM

local_llm = LLM(
    model="ollama/llama3.1:8b",
    base_url="http://localhost:11434",
    temperature=0.5,
)

researcher = Agent(
    role="Research Assistant",
    goal="Gather and organize information efficiently",
    backstory="Diligent research assistant with strong organizational skills.",
    llm=local_llm,
)

Local models trade capability for privacy and cost. An 8B parameter model handles straightforward tasks like summarization and formatting well. For complex reasoning or tool use, larger models (70B or above) or cloud-hosted models perform significantly better.

Using Azure OpenAI

For enterprise deployments that require Azure's compliance certifications:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

export AZURE_API_KEY="your-azure-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com/"
export AZURE_API_VERSION="2024-08-01-preview"
from crewai import Agent, LLM

azure_llm = LLM(
    model="azure/your-deployment-name",
    api_key="your-azure-key",
    base_url="https://your-resource.openai.azure.com/",
    api_version="2024-08-01-preview",
)

compliance_agent = Agent(
    role="Compliance Reviewer",
    goal="Review documents for regulatory compliance",
    backstory="Expert in GDPR, HIPAA, and SOC 2 compliance requirements.",
    llm=azure_llm,
)

Azure deployments use your custom deployment name rather than OpenAI's standard model names. Ensure your deployment has sufficient token-per-minute quota for agent workloads, which typically make many sequential calls.

Mixing Models in a Single Crew

One of CrewAI's strengths is per-agent model assignment. Use powerful models where reasoning quality matters and cheaper models where it does not:

from crewai import Agent, Task, Crew, Process, LLM

# Expensive, high-capability model for complex analysis
claude_llm = LLM(model="anthropic/claude-sonnet-4-20250514", temperature=0.7)

# Cost-effective model for formatting and simple tasks
gpt_mini = LLM(model="openai/gpt-4o-mini", temperature=0.3)

# Local model for data processing (no API cost)
local_llm = LLM(model="ollama/llama3.1:8b", base_url="http://localhost:11434")

analyst = Agent(
    role="Senior Analyst",
    goal="Perform deep strategic analysis",
    backstory="Expert analyst requiring nuanced reasoning.",
    llm=claude_llm,
)

data_processor = Agent(
    role="Data Processor",
    goal="Clean and structure raw data",
    backstory="Efficient data processing specialist.",
    llm=local_llm,
)

formatter = Agent(
    role="Report Formatter",
    goal="Format analysis into polished reports",
    backstory="Technical writer focused on presentation.",
    llm=gpt_mini,
)

This architecture optimizes the cost-quality tradeoff. The analyst needs the best reasoning capability. The data processor handles routine work locally. The formatter uses a small, fast model since it is mostly reorganizing existing content.

Model Parameters and Tuning

Fine-tune model behavior with LLM parameters:

from crewai import LLM

llm = LLM(
    model="openai/gpt-4o",
    temperature=0.2,
    max_tokens=4096,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    seed=42,
)

Key parameters to adjust:

  • temperature — Lower (0.1-0.3) for analytical tasks, higher (0.7-0.9) for creative tasks. Agent reasoning generally works best at 0.3-0.5.
  • max_tokens — Set based on expected output length. Too low and outputs get truncated. Too high and you waste money on unused capacity.
  • top_p — Alternative to temperature for controlling randomness. Usually keep at 0.9-1.0.
  • seed — Enables deterministic outputs for reproducible testing.

Implementing Fallback Strategies

Production systems need resilience. Implement fallbacks when a primary model is unavailable:

from crewai import Agent, LLM

def create_resilient_agent(role, goal, backstory):
    """Create an agent with fallback LLM configuration."""
    try:
        primary = LLM(model="anthropic/claude-sonnet-4-20250514")
        # Test the connection
        return Agent(role=role, goal=goal, backstory=backstory, llm=primary)
    except Exception:
        fallback = LLM(model="openai/gpt-4o")
        return Agent(role=role, goal=goal, backstory=backstory, llm=fallback)

analyst = create_resilient_agent(
    role="Analyst",
    goal="Analyze market data",
    backstory="Senior market analyst.",
)

For more sophisticated fallback handling, use LiteLLM's built-in router with fallback configurations, which CrewAI supports natively.

FAQ

Which LLM works best with CrewAI?

For most use cases, GPT-4o and Claude Sonnet provide the best balance of reasoning quality, tool use reliability, and cost. GPT-4o has a slight edge in tool calling, while Claude excels at nuanced analysis and longer outputs. For cost-sensitive tasks, GPT-4o-mini performs surprisingly well on straightforward work.

Can I use different models for the manager agent in hierarchical mode?

Yes. The manager agent is a regular Agent instance, so you can assign it any LLM. Use a stronger model for the manager since it handles the complex task of delegation, quality assessment, and coordination. Worker agents can use lighter models.

How do I handle rate limits when using multiple agents?

Set max_rpm (maximum requests per minute) on each agent to stay within your provider's rate limits. For example, max_rpm=10 limits the agent to 10 LLM calls per minute. Distribute your rate budget based on task complexity — give analytical agents more headroom and formatting agents less.


#CrewAI #LLMConfiguration #Claude #Ollama #AzureOpenAI #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.

Learn Agentic AI

AI Agent Framework Comparison 2026: LangGraph vs CrewAI vs AutoGen vs OpenAI Agents SDK

Side-by-side comparison of the top 4 AI agent frameworks: LangGraph, CrewAI, AutoGen, and OpenAI Agents SDK — architecture, features, production readiness, and when to choose each.

Learn Agentic AI

How to Build an AI Coding Assistant with Claude and MCP: Step-by-Step Guide

Build a powerful AI coding assistant that reads files, runs tests, and fixes bugs using the Claude API and Model Context Protocol servers in TypeScript.

Learn Agentic AI

Building Your First MCP Server: Connect AI Agents to Any External Tool

Step-by-step tutorial on building an MCP server in TypeScript, registering tools and resources, handling requests, and connecting to Claude and other LLM clients.

Learn Agentic AI

CrewAI Multi-Agent Tutorial: Role-Based Agent Teams for Complex Tasks

Hands-on CrewAI tutorial covering agent definitions with roles, goals, and backstories, task creation, sequential and hierarchical processes, and delegation patterns.

Learn Agentic AI

Computer Use Agents 2026: How Claude, GPT-5.4, and Gemini Navigate Desktop Applications

Comparison of computer use capabilities across Claude, GPT-5.4, and Gemini including accuracy benchmarks, speed tests, supported applications, and real-world limitations.