By Sagar Shankaran, Founder of CallSphere
smolagents fits its entire core in under 1,000 lines and beats GPT-4 Turbo by 6x on GAIA. Here is what makes the CodeAgent loop tick and when to pick it for production.
Key takeaways
TL;DR —
smolagentsis the most readable agent framework in the ecosystem. The core loop is under 1,000 lines, the CodeAgent writes Python instead of JSON tool calls, and on GAIA it scores 44.2% versus GPT-4 Turbo's 7%. If you want to understand what your agent is doing, start here.
flowchart LR
Repo[GitHub repo] --> CI[GitHub Actions]
CI --> Eval[Agent eval suite · PromptFoo]
Eval -->|pass| Deploy[Deploy]
Eval -->|fail| Block[Block PR]
Deploy --> Prod[Production agent]
Prod --> Trace[(LangSmith trace)]
Trace --> EvalHuggingFace shipped smolagents in late 2024 as a deliberate counter-reaction to the megaframework era. The pitch: an agent doesn't need a planner, a router, a memory module, and a 50-class type hierarchy. It needs a loop that picks a tool and runs it. agents.py weighs in at under a thousand lines. The library went from 3k stars in early 2025 to 26k+ by April 2026.
A smolagent does exactly this:
max_steps or a final answer.That's it. The two flagship agent classes are:
CodeAgent — the model emits Python code directly. The framework executes it in a sandbox and returns the result. Research shows code agents use 30% fewer steps than JSON tool-calling agents and score higher on hard benchmarks.ToolCallingAgent — the classic JSON tool-call loop, for models that don't write Python well or for environments where you can't sandbox.Letting a model emit arbitrary Python and exec()-ing it on your laptop is a textbook prompt-injection RCE. smolagents ships first-class executors for E2B, Modal, Blaxel, Docker, and the Pyodide + Deno WebAssembly sandbox. You opt in with one parameter:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
from smolagents import CodeAgent, HfApiModel
agent = CodeAgent(
tools=[search_tool, fetch_tool],
model=HfApiModel("Qwen/Qwen2.5-72B-Instruct"),
executor_type="e2b", # or "modal", "docker", "wasm"
additional_authorized_imports=["pandas", "numpy"],
)
For CallSphere internal use we lean on Modal for batch agent workloads — it gives us per-task containers with GPU optionality and no idle cost.
Pick smolagents when:
Skip smolagents when:
CallSphere's voice surface is OpenAI Realtime + the OpenAI Agents SDK orchestrator (37 specialist agents, 90+ tools, 115+ DB tables across 6 verticals). For non-voice batch workloads — the GTM scrapers, the SEO content generators, the affiliate-fraud detectors — we run smolagents CodeAgents on Modal because the loop is auditable and the cost per task is pennies. The GAIA-style "open-ended research with tools" pattern is exactly what these batch jobs need.
Pricing: $149 Starter, $499 Growth, $1499 Scale, with a 14-day trial and a 22% affiliate program.
pip install smolagents[e2b] and set E2B_API_KEY.@tool — type hints become the JSON schema automatically.HfApiModel, LiteLLMModel, OpenAIServerModel, or TransformersModel for local.max_steps low (5-8) for online workloads, higher (20+) for offline research.validate_output(...)).from smolagents import CodeAgent, LiteLLMModel, tool
@tool
def fetch_competitor_pricing(domain: str) -> str:
"""Fetch a competitor's public pricing page and return the raw HTML."""
import httpx
return httpx.get(f"https://{domain}/pricing", timeout=10).text
@tool
def parse_table(html: str, css_selector: str) -> list[dict]:
"""Parse an HTML table into a list of dicts using the given CSS selector."""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
return [{c.get("data-key", c.text): c.text for c in row.find_all("td")}
for row in soup.select(css_selector)]
agent = CodeAgent(
tools=[fetch_competitor_pricing, parse_table],
model=LiteLLMModel("claude-sonnet-4"),
executor_type="modal",
additional_authorized_imports=["pandas", "json"],
max_steps=8,
)
result = agent.run("Compare CallSphere's $149/$499/$1499 tiers to a competitor's published pricing.")
The CodeAgent's plan, in our traces, looks something like: fetch the page, parse the pricing table, build a pandas DataFrame, compute deltas, return a summary. The model never emits JSON tool calls — it writes Python that orchestrates the tools. That's the 30% step-count win.
Three things bit us when we first shipped smolagents:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
additional_authorized_imports is a kill-switch. If you don't list a stdlib module the agent needs (e.g., json, re, datetime), every code execution fails. Start permissive in dev, tighten in prod.max_steps is critical. Without it a confused agent will burn dollars in a loop. We default to 5-8 for online jobs and gate higher values behind a feature flag.Is the CodeAgent really safe? Only if you sandbox. Never run a CodeAgent's code in your application process. The framework will warn you, but it's your responsibility to set executor_type.
Can I use smolagents with MCP servers? Yes — there are MCP client wrappers in the community. As of April 2026 the official smolagents repo has examples that mount MCP tools as @tool functions.
How does it compare to Pydantic AI? Pydantic AI is type-first; smolagents is loop-first. If you want strict structured outputs from every step, Pydantic AI. If you want a transparent ReAct loop, smolagents.
How do I evaluate it? Use Promptfoo or Phoenix Evals; smolagents emits standard OpenInference spans.
Does smolagents support multi-agent topologies? Yes — agents can spawn and call other agents as tools. The pattern is simpler than CrewAI's role-playing model but plenty for most production cases.
What model picks should I default to? GPT-5 or Claude Sonnet 4 for production CodeAgents. For local, Qwen2.5-72B-Instruct is the sweet spot of cost and Python-writing quality.
How do I plug in OpenTelemetry? Use openinference-instrumentation-smolagents. One line, traces flow to Phoenix, Langfuse, or your OTel backend.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Open-source agent memory in 2026: Mem0, Letta, Cognee, Graphiti, txtai, MemoryScope. A side-by-side feature matrix and a recommendation per typical use case profile.
Enterprise CIO Guide perspective on Aider keeps quietly shipping — version 0.80 adds architect mode, repository maps, and faster diff application.
Chicago tech teams compare ChatGPT Operator 2.0 with open-source Skyvern for browser automation — when to pay for managed and when to self-host.
Arize Phoenix is the open-source LLM observability tool that grew up significantly in 2026. Tracing, evals, and the OTel-native approach that makes Phoenix portable.
SMB Founder Playbook perspective on Aider keeps quietly shipping — version 0.80 adds architect mode, repository maps, and faster diff application.
Pair Claude Code with Code-Review-Graph and you have a local-first agentic IDE with deterministic context, blast radius PR review, and zero per-seat indexing fees.
© 2026 CallSphere LLC. All rights reserved.