Skip to content
Shell Tool: Running System Commands from OpenAI Agents
Learn Agentic AI10 min read35 views

Shell Tool: Running System Commands from OpenAI Agents

Learn how to use the OpenAI Agents SDK ShellTool to give agents the ability to run system commands inside hosted containers, configure network policies, and manage container skills.

Why Agents Need Shell Access

Tools let agents call Python functions. But many real-world tasks require running system commands — executing scripts, installing packages, manipulating files, running database queries, or orchestrating infrastructure. The OpenAI Agents SDK ShellTool provides a structured, sandboxed way for agents to execute shell commands inside hosted containers without giving them unrestricted access to your host system.

This post covers the ShellTool architecture, hosted container configuration, network policies, container skills, and production safety patterns.

The ShellTool Basics

The ShellTool is a built-in tool that exposes a shell interface to your agent. At its simplest, you add it like any other tool:

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner
from agents.tools import ShellTool

agent = Agent(
    name="DevOpsAgent",
    instructions="""You are a DevOps assistant. You can run shell commands
    to help with system administration tasks. Always explain what each
    command does before running it. Never run destructive commands
    without explicit confirmation.""",
    tools=[ShellTool()],
    model="gpt-4o",
)

When the agent decides it needs to run a command, it calls the shell tool with a command string. The tool executes the command and returns both stdout and stderr to the agent, which can then analyze the output and decide on next steps.

Hosted Containers: The Execution Environment

In production, you do not want agents running commands on your application server. The SDK supports hosted containers — isolated execution environments where shell commands run safely. These containers are ephemeral, disposable, and configured with limited permissions.

Here is how to configure a hosted container environment:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from agents import Agent, Runner
from agents.tools import ShellTool
from agents.containers import HostedContainer, ContainerConfig

container_config = ContainerConfig(
    image="python:3.12-slim",
    memory_limit="512m",
    cpu_limit="1.0",
    timeout=300,
    working_directory="/workspace",
)

container = HostedContainer(config=container_config)

agent = Agent(
    name="SandboxedAgent",
    instructions="You are a code execution assistant. Run Python scripts and shell commands in your sandboxed environment.",
    tools=[
        ShellTool(container=container),
    ],
    model="gpt-4o",
)

The container configuration specifies resource limits, timeouts, and the base image. The agent's commands execute inside this container, isolated from your host system. When the runner completes, the container is destroyed.

Network Policies: Controlling What Agents Can Reach

One of the most important security controls for shell-enabled agents is network policy. You do not want an agent making arbitrary HTTP requests to external services, exfiltrating data, or accessing internal infrastructure it should not touch.

Network policies define what network access the container has:

from agents.containers import NetworkPolicy, NetworkRule

# Restrictive policy: only allow PyPI and internal API
policy = NetworkPolicy(
    default="deny",
    rules=[
        NetworkRule(
            destination="pypi.org",
            ports=[443],
            protocol="tcp",
            action="allow",
        ),
        NetworkRule(
            destination="api.internal.company.com",
            ports=[443],
            protocol="tcp",
            action="allow",
        ),
    ],
)

container_config = ContainerConfig(
    image="python:3.12-slim",
    memory_limit="512m",
    network_policy=policy,
)

The default="deny" setting blocks all outbound traffic except the explicitly listed rules. This is a critical defense-in-depth measure. Even if the model generates a malicious command that tries to exfiltrate data, the network policy prevents it from reaching any external destination.

Policy recommendations by use case:

  • Code execution agents: Deny all network, or allow only package registries
  • Web scraping agents: Allow only specific target domains
  • DevOps agents: Allow only known internal infrastructure endpoints
  • Data analysis agents: Deny all network — mount data as volumes instead

Container Skills: Pre-Configured Capabilities

Skills are predefined sets of packages, files, and configurations that a container starts with. Instead of having the agent install dependencies on every run, you define skills that bake those dependencies into the container:

from agents.containers import ContainerSkill, HostedContainer

data_science_skill = ContainerSkill(
    name="data-science",
    packages=["pandas", "numpy", "matplotlib", "scikit-learn"],
    setup_commands=[
        "mkdir -p /workspace/data",
        "mkdir -p /workspace/output",
    ],
    environment={
        "MPLBACKEND": "Agg",
    },
)

web_scraping_skill = ContainerSkill(
    name="web-scraping",
    packages=["beautifulsoup4", "requests", "lxml"],
    setup_commands=[
        "mkdir -p /workspace/scraped",
    ],
)

container = HostedContainer(
    config=ContainerConfig(
        image="python:3.12-slim",
        memory_limit="1g",
    ),
    skills=[data_science_skill, web_scraping_skill],
)

Skills run their setup commands and install their packages when the container starts, before the agent begins executing. This reduces latency during the agent loop and ensures a consistent environment.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Handling Command Output

The ShellTool returns structured output that includes stdout, stderr, and the exit code. Your agent can use all three to understand what happened:

from agents import Agent, Runner
import asyncio

agent = Agent(
    name="DiagnosticAgent",
    instructions="""You are a diagnostic agent. When running commands:
    1. Check the exit code first — 0 means success, non-zero means failure
    2. If the command fails, read stderr for the error message
    3. Diagnose the issue and suggest or attempt a fix
    4. Never retry the same failing command more than twice""",
    tools=[ShellTool()],
    model="gpt-4o",
)

async def main():
    result = await Runner.run(
        agent,
        input="Check if PostgreSQL is running, and if so, list all databases",
    )
    print(result.final_output)

asyncio.run(main())

The agent will run commands like pg_isready and psql -l, examine the output, and respond with a synthesized answer. If PostgreSQL is not running, it will report the error from stderr rather than guessing.

Security Patterns for Production

Never run ShellTool on the host. Always use a container or remote execution environment. A prompt injection that reaches a host shell can compromise your entire system.

Limit command execution time. Set the container timeout to prevent runaway processes. Commands that hang — like a curl to an unresponsive server — should be killed after a reasonable period.

Log everything. Record every command the agent runs, its output, and the agent's reasoning for running it. This audit trail is essential for debugging, compliance, and detecting misuse.

Use read-only file mounts. If the agent needs access to data files, mount them as read-only volumes. The agent should write outputs to a dedicated writable directory:

container_config = ContainerConfig(
    image="python:3.12-slim",
    volumes={
        "/data/input": {"bind": "/workspace/input", "mode": "ro"},
        "/data/output": {"bind": "/workspace/output", "mode": "rw"},
    },
)

Combine with approval gates. For sensitive operations, pair the ShellTool with the approval mechanism covered later in this series. The agent proposes a command, a human reviews it, and only approved commands execute.

The ShellTool transforms agents from API callers into full system operators. With proper container isolation, network policies, and skills configuration, you can safely delegate complex system administration and data processing tasks to AI agents while maintaining tight security boundaries.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.