---
title: "Tool Result Formatting: Helping LLMs Understand Tool Outputs"
description: "Master the art of formatting tool results so LLMs can effectively parse and reason about them. Covers string formatting strategies, truncation, structured vs unstructured results, error messages, and token-efficient output design."
canonical: https://callsphere.ai/blog/tool-result-formatting-helping-llms-understand-outputs
category: "Learn Agentic AI"
tags: ["Tool Design", "LLM Optimization", "Function Calling", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:42.653Z
---

# Tool Result Formatting: Helping LLMs Understand Tool Outputs

> Master the art of formatting tool results so LLMs can effectively parse and reason about them. Covers string formatting strategies, truncation, structured vs unstructured results, error messages, and token-efficient output design.

## The Forgotten Half of Tool Design

Most developers spend their time on tool schemas and execution logic. But the tool result — the string you pass back to the LLM — is equally important. A well-formatted result helps the LLM extract the right information on the first pass. A poorly formatted result leads to hallucinations, missed data, or unnecessary follow-up tool calls.

The tool result is a string. That is your only interface. Everything you need the LLM to understand must be encoded in that string.

## Principle 1: Lead with the Answer

Put the most important information first. LLMs process text sequentially and are better at using information that appears early in a message:

```mermaid
flowchart TD
    USER(["User message"])
    LLM["LLM call
with tools schema"]
    DECIDE{"Model wants
to call a tool?"}
    EXEC["Execute tool
sandboxed runtime"]
    RESULT["Append tool_result
to messages"]
    GUARD{"Output passes
guardrails?"}
    DONE(["Final reply"])
    BLOCK(["Refuse and log"])
    USER --> LLM --> DECIDE
    DECIDE -->|Yes| EXEC --> RESULT --> LLM
    DECIDE -->|No| GUARD
    GUARD -->|Yes| DONE
    GUARD -->|No| BLOCK
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style EXEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style DONE fill:#059669,stroke:#047857,color:#fff
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
```

```python
# Bad: answer buried after metadata
def format_weather(data: dict) -> str:
    return f"""API Response:
Status: 200 OK
Cache: HIT
Request ID: abc-123
Timestamp: 2026-03-17T10:30:00Z

Location: {data['city']}
Temperature: {data['temp_f']}F
Conditions: {data['conditions']}"""

# Good: answer first, metadata optional
def format_weather(data: dict) -> str:
    return f"""Current weather in {data['city']}:
Temperature: {data['temp_f']}F ({data['temp_c']}C)
Conditions: {data['conditions']}
Humidity: {data['humidity']}%
Wind: {data['wind_speed']} mph {data['wind_dir']}"""
```

The LLM does not need your HTTP status codes, cache headers, or request IDs. It needs the weather data.

## Principle 2: Use Consistent Structure

When a tool can return different types of results, maintain a consistent format:

```python
def format_search_results(results: list[dict]) -> str:
    if not results:
        return "No results found."

    lines = [f"Found {len(results)} result(s):\n"]

    for i, r in enumerate(results, 1):
        lines.append(f"{i}. {r['title']}")
        lines.append(f"   URL: {r['url']}")
        lines.append(f"   Snippet: {r['snippet']}")
        lines.append("")

    return "\n".join(lines)
```

Numbered items with consistent indentation and field labels make it easy for the LLM to parse individual results and refer to them by number in its response.

## Principle 3: Truncate Thoughtfully

Raw tool outputs can be massive. Truncation is not optional — it is a core design decision:

```python
def truncate_result(content: str, max_chars: int = 4000) -> str:
    if len(content)  max_chars * 0.8:
        truncated = truncated[:last_newline]

    total_chars = len(content)
    return f"{truncated}\n\n[Truncated: showing {len(truncated)} of {total_chars} characters. Call with offset parameter to see more.]"
```

The truncation message tells the LLM how much data was cut and how to get more. Without this, the LLM may assume it has all the data and produce incomplete answers.

## Principle 4: Format Errors as Actionable Messages

Error results should tell the LLM what went wrong and what it can do about it:

```python
# Bad: generic error
def handle_error_bad(e: Exception) -> str:
    return f"Error: {str(e)}"

# Good: actionable error with context
def handle_error_good(tool_name: str, e: Exception, suggestion: str = "") -> str:
    error_msg = f"Tool '{tool_name}' failed: {str(e)}"

    if suggestion:
        error_msg += f"\nSuggestion: {suggestion}"

    return error_msg

# Usage examples
handle_error_good(
    "query_database",
    Exception("relation 'users' does not exist"),
    "The table might be named 'customers'. Call get_schema to check available tables."
)

handle_error_good(
    "fetch_webpage",
    Exception("HTTP 403 Forbidden"),
    "This site blocks automated requests. Try a different source for this information."
)
```

The suggestion guides the LLM toward recovery instead of blindly retrying the same call.

## Principle 5: Tables for Tabular Data

When returning rows of data, format them as aligned tables:

```python
def format_as_table(rows: list[dict], columns: list[str] = None) -> str:
    if not rows:
        return "No data."

    if columns is None:
        columns = list(rows[0].keys())

    # Calculate column widths
    widths = {col: len(col) for col in columns}
    for row in rows:
        for col in columns:
            val = str(row.get(col, ""))
            widths[col] = max(widths[col], min(len(val), 40))

    # Build header
    header = " | ".join(col.ljust(widths[col]) for col in columns)
    separator = "-+-".join("-" * widths[col] for col in columns)

    # Build rows
    lines = [header, separator]
    for row in rows:
        line = " | ".join(
            str(row.get(col, ""))[:40].ljust(widths[col])
            for col in columns
        )
        lines.append(line)

    return "\n".join(lines)
```

Tables are more token-efficient than JSON for tabular data and easier for the LLM to scan visually.

## Principle 6: Include Metadata When It Aids Reasoning

Some metadata helps the LLM make better decisions:

```python
def format_db_results(rows: list[dict], query_time_ms: float, total_count: int) -> str:
    output = format_as_table(rows)

    metadata = []
    if len(rows)  str:
        if isinstance(data, list) and data and isinstance(data[0], dict):
            result = format_as_table(data)
        elif isinstance(data, dict):
            import json
            result = json.dumps(data, indent=2, default=str)
        elif isinstance(data, str):
            result = data
        else:
            result = str(data)

        return truncate_result(result, self.max_chars)

    def error(self, tool_name: str, error: str, suggestion: str = "") -> str:
        msg = f"[{tool_name}] Error: {error}"
        if suggestion:
            msg += f"\nSuggestion: {suggestion}"
        return msg

    def empty(self, tool_name: str, query_description: str = "") -> str:
        msg = f"[{tool_name}] No results found"
        if query_description:
            msg += f" for: {query_description}"
        msg += ". Try broadening your search criteria."
        return msg
```

## FAQ

### Should I return JSON or plain text from tools?

It depends on the data. For structured records (API responses, database rows), JSON or tables work well. For text content (web pages, file contents, search snippets), plain text is more natural. The key metric is: can the LLM parse the result accurately on the first attempt?

### How long should tool results be?

Keep results under 4000 characters as a default. Beyond that, you are spending tokens on data the LLM may not fully process. For data-heavy tools, return summaries or the first N results with a note about how to get more. The sweet spot is enough data to answer the question without drowning the model in noise.

### Should I format results differently for different LLMs?

In practice, the formatting principles above work well across all major models. The differences in how GPT-4, Claude, and Gemini process tool results are minor compared to the impact of good formatting practices. Focus on clarity, conciseness, and putting important information first.

---

#ToolDesign #LLMOptimization #FunctionCalling #AIAgents #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/tool-result-formatting-helping-llms-understand-outputs
