Skip to content
Learn Agentic AI
Learn Agentic AI10 min read2 views

Building a Calculator Tool for AI Agents: Step-by-Step Tutorial

Walk through building a complete calculator tool for an AI agent from scratch. Covers schema definition, safe expression evaluation, result handling, and integration with the agent loop.

Why Build a Calculator Tool?

LLMs are notoriously unreliable at arithmetic. They can set up equations correctly but frequently miscalculate the result. A calculator tool solves this by offloading the computation to deterministic code. It is also the simplest possible tool to build, making it an ideal starting point for understanding the full tool-calling lifecycle.

This tutorial walks through building a calculator tool, registering it with an agent, and handling the execution loop.

Step 1: Define the Tool Schema

The schema tells the LLM what the tool does and what parameters it accepts:

flowchart TD
    START["Building a Calculator Tool for AI Agents: Step-by…"] --> A
    A["Why Build a Calculator Tool?"]
    A --> B
    B["Step 1: Define the Tool Schema"]
    B --> C
    C["Step 2: Implement the Tool Function"]
    C --> D
    D["Step 3: Wire It Into the Agent Loop"]
    D --> E
    E["Step 4: Handle Edge Cases"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
calculator_schema = {
    "type": "function",
    "function": {
        "name": "calculate",
        "description": "Evaluate a mathematical expression and return the numeric result. Use this for any arithmetic, percentages, or mathematical calculations. Input must be a valid Python math expression.",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A mathematical expression to evaluate, e.g. '(25 * 4) + 17' or '150 * 0.15'. Use Python syntax for operations."
                }
            },
            "required": ["expression"]
        }
    }
}

The description explicitly says "Python math expression" to guide the LLM toward valid syntax like ** for exponents instead of ^.

Step 2: Implement the Tool Function

Never use eval() on untrusted input. Instead, use Python's ast module to parse the expression safely:

flowchart LR
    S0["Step 1: Define the Tool Schema"]
    S0 --> S1
    S1["Step 2: Implement the Tool Function"]
    S1 --> S2
    S2["Step 3: Wire It Into the Agent Loop"]
    S2 --> S3
    S3["Step 4: Handle Edge Cases"]
    style S0 fill:#4f46e5,stroke:#4338ca,color:#fff
    style S3 fill:#059669,stroke:#047857,color:#fff
import ast
import operator
import math

SAFE_OPERATORS = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.truediv,
    ast.FloorDiv: operator.floordiv,
    ast.Mod: operator.mod,
    ast.Pow: operator.pow,
    ast.USub: operator.neg,
    ast.UAdd: operator.pos,
}

SAFE_FUNCTIONS = {
    "sqrt": math.sqrt,
    "abs": abs,
    "round": round,
    "min": min,
    "max": max,
}

def safe_eval(node):
    if isinstance(node, ast.Expression):
        return safe_eval(node.body)
    elif isinstance(node, ast.Constant):
        if isinstance(node.value, (int, float)):
            return node.value
        raise ValueError(f"Unsupported constant: {node.value}")
    elif isinstance(node, ast.BinOp):
        left = safe_eval(node.left)
        right = safe_eval(node.right)
        op_func = SAFE_OPERATORS.get(type(node.op))
        if op_func is None:
            raise ValueError(f"Unsupported operator: {type(node.op).__name__}")
        return op_func(left, right)
    elif isinstance(node, ast.UnaryOp):
        operand = safe_eval(node.operand)
        op_func = SAFE_OPERATORS.get(type(node.op))
        if op_func is None:
            raise ValueError(f"Unsupported unary operator: {type(node.op).__name__}")
        return op_func(operand)
    elif isinstance(node, ast.Call):
        if isinstance(node.func, ast.Name) and node.func.id in SAFE_FUNCTIONS:
            args = [safe_eval(arg) for arg in node.args]
            return SAFE_FUNCTIONS[node.func.id](*args)
        raise ValueError(f"Unsupported function call")
    else:
        raise ValueError(f"Unsupported expression type: {type(node).__name__}")

def calculate(expression: str) -> str:
    try:
        tree = ast.parse(expression, mode="eval")
        result = safe_eval(tree)
        return str(result)
    except (ValueError, SyntaxError, TypeError, ZeroDivisionError) as e:
        return f"Error: {str(e)}"

This evaluator supports basic arithmetic, exponentiation, and a whitelist of safe functions without exposing the system to code injection.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Step 3: Wire It Into the Agent Loop

Here is a complete agent loop using the OpenAI API that calls the calculator tool:

from openai import OpenAI

client = OpenAI()

def run_agent(user_message: str) -> str:
    messages = [
        {"role": "system", "content": "You are a helpful assistant. Use the calculate tool for any math."},
        {"role": "user", "content": user_message}
    ]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=[calculator_schema],
        )
        msg = response.choices[0].message
        messages.append(msg)

        if msg.tool_calls:
            for tool_call in msg.tool_calls:
                import json
                args = json.loads(tool_call.function.arguments)
                result = calculate(args["expression"])

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result,
                })
        else:
            return msg.content

answer = run_agent("What is 15% tip on a $247.50 dinner bill split 3 ways?")
print(answer)

The agent loop continues until the LLM stops making tool calls and returns a text response. Each tool call result is appended with the matching tool_call_id so the LLM can correlate results to requests.

Step 4: Handle Edge Cases

Your calculator will receive unexpected inputs. Build robustness into the tool function:

def calculate(expression: str) -> str:
    if not expression or not expression.strip():
        return "Error: Empty expression"
    if len(expression) > 500:
        return "Error: Expression too long"
    try:
        tree = ast.parse(expression, mode="eval")
        result = safe_eval(tree)
        if isinstance(result, float) and (math.isinf(result) or math.isnan(result)):
            return "Error: Result is infinity or undefined"
        return str(round(result, 10))
    except Exception as e:
        return f"Error: {str(e)}"

Returning a clear error string instead of raising an exception lets the LLM recover by adjusting the expression and trying again.

FAQ

Why not just use Python eval() for the calculator?

Using eval() on LLM-generated strings is a critical security vulnerability. The LLM could produce expressions like __import__('os').system('rm -rf /') either through prompt injection or a malformed response. The AST-based evaluator restricts execution to pure mathematical operations.

Can the LLM call the calculator multiple times in one turn?

Yes. If the model generates multiple tool_calls in a single response, you should execute all of them and return all results. The model might break a complex calculation into steps, calling the calculator for each one.

How do I test that my tool schema works correctly?

Send test prompts that should trigger tool calls and verify the LLM generates valid arguments. Common failure modes include the LLM using ^ for exponents instead of **, or passing expressions with variables. Add these as examples in your tool description to guide correct usage.


#ToolBuilding #FunctionCalling #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Automating Client Document Collection: How AI Agents Chase Missing Tax Documents and Reduce Filing Delays

See how AI agents automate tax document collection — chasing missing W-2s, 1099s, and receipts via calls and texts to eliminate the #1 CPA bottleneck.

Technical Guides

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

Hands-on tutorial for building voice agents with the OpenAI Realtime API — WebSocket setup, PCM16 audio, server VAD, and function calling.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

API Design for AI Agent Tool Functions: Best Practices and Anti-Patterns

How to design tool functions that LLMs can use effectively with clear naming, enum parameters, structured responses, informative error messages, and documentation.

Learn Agentic AI

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns

Agent-specific prompt engineering techniques: crafting effective system prompts, writing clear tool descriptions for function calling, and few-shot examples that improve complex task performance.

Learn Agentic AI

Computer Use in GPT-5.4: Building AI Agents That Navigate Desktop Applications

Technical guide to GPT-5.4's computer use capabilities for building AI agents that interact with desktop UIs, browser automation, and real-world application workflows.