Shell Tool: Running System Commands from OpenAI Agents
Learn how to use the OpenAI Agents SDK ShellTool to give agents the ability to run system commands inside hosted containers, configure network policies, and manage container skills.
Why Agents Need Shell Access
Tools let agents call Python functions. But many real-world tasks require running system commands — executing scripts, installing packages, manipulating files, running database queries, or orchestrating infrastructure. The OpenAI Agents SDK ShellTool provides a structured, sandboxed way for agents to execute shell commands inside hosted containers without giving them unrestricted access to your host system.
This post covers the ShellTool architecture, hosted container configuration, network policies, container skills, and production safety patterns.
The ShellTool Basics
The ShellTool is a built-in tool that exposes a shell interface to your agent. At its simplest, you add it like any other tool:
flowchart TD
START["Shell Tool: Running System Commands from OpenAI A…"] --> A
A["Why Agents Need Shell Access"]
A --> B
B["The ShellTool Basics"]
B --> C
C["Hosted Containers: The Execution Enviro…"]
C --> D
D["Network Policies: Controlling What Agen…"]
D --> E
E["Container Skills: Pre-Configured Capabi…"]
E --> F
F["Handling Command Output"]
F --> G
G["Security Patterns for Production"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner
from agents.tools import ShellTool
agent = Agent(
name="DevOpsAgent",
instructions="""You are a DevOps assistant. You can run shell commands
to help with system administration tasks. Always explain what each
command does before running it. Never run destructive commands
without explicit confirmation.""",
tools=[ShellTool()],
model="gpt-4o",
)
When the agent decides it needs to run a command, it calls the shell tool with a command string. The tool executes the command and returns both stdout and stderr to the agent, which can then analyze the output and decide on next steps.
Hosted Containers: The Execution Environment
In production, you do not want agents running commands on your application server. The SDK supports hosted containers — isolated execution environments where shell commands run safely. These containers are ephemeral, disposable, and configured with limited permissions.
Here is how to configure a hosted container environment:
from agents import Agent, Runner
from agents.tools import ShellTool
from agents.containers import HostedContainer, ContainerConfig
container_config = ContainerConfig(
image="python:3.12-slim",
memory_limit="512m",
cpu_limit="1.0",
timeout=300,
working_directory="/workspace",
)
container = HostedContainer(config=container_config)
agent = Agent(
name="SandboxedAgent",
instructions="You are a code execution assistant. Run Python scripts and shell commands in your sandboxed environment.",
tools=[
ShellTool(container=container),
],
model="gpt-4o",
)
The container configuration specifies resource limits, timeouts, and the base image. The agent's commands execute inside this container, isolated from your host system. When the runner completes, the container is destroyed.
Network Policies: Controlling What Agents Can Reach
One of the most important security controls for shell-enabled agents is network policy. You do not want an agent making arbitrary HTTP requests to external services, exfiltrating data, or accessing internal infrastructure it should not touch.
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["Code execution agents: Deny all network…"]
CENTER --> N1["Web scraping agents: Allow only specifi…"]
CENTER --> N2["DevOps agents: Allow only known interna…"]
CENTER --> N3["Data analysis agents: Deny all network …"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
Network policies define what network access the container has:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from agents.containers import NetworkPolicy, NetworkRule
# Restrictive policy: only allow PyPI and internal API
policy = NetworkPolicy(
default="deny",
rules=[
NetworkRule(
destination="pypi.org",
ports=[443],
protocol="tcp",
action="allow",
),
NetworkRule(
destination="api.internal.company.com",
ports=[443],
protocol="tcp",
action="allow",
),
],
)
container_config = ContainerConfig(
image="python:3.12-slim",
memory_limit="512m",
network_policy=policy,
)
The default="deny" setting blocks all outbound traffic except the explicitly listed rules. This is a critical defense-in-depth measure. Even if the model generates a malicious command that tries to exfiltrate data, the network policy prevents it from reaching any external destination.
Policy recommendations by use case:
- Code execution agents: Deny all network, or allow only package registries
- Web scraping agents: Allow only specific target domains
- DevOps agents: Allow only known internal infrastructure endpoints
- Data analysis agents: Deny all network — mount data as volumes instead
Container Skills: Pre-Configured Capabilities
Skills are predefined sets of packages, files, and configurations that a container starts with. Instead of having the agent install dependencies on every run, you define skills that bake those dependencies into the container:
from agents.containers import ContainerSkill, HostedContainer
data_science_skill = ContainerSkill(
name="data-science",
packages=["pandas", "numpy", "matplotlib", "scikit-learn"],
setup_commands=[
"mkdir -p /workspace/data",
"mkdir -p /workspace/output",
],
environment={
"MPLBACKEND": "Agg",
},
)
web_scraping_skill = ContainerSkill(
name="web-scraping",
packages=["beautifulsoup4", "requests", "lxml"],
setup_commands=[
"mkdir -p /workspace/scraped",
],
)
container = HostedContainer(
config=ContainerConfig(
image="python:3.12-slim",
memory_limit="1g",
),
skills=[data_science_skill, web_scraping_skill],
)
Skills run their setup commands and install their packages when the container starts, before the agent begins executing. This reduces latency during the agent loop and ensures a consistent environment.
Handling Command Output
The ShellTool returns structured output that includes stdout, stderr, and the exit code. Your agent can use all three to understand what happened:
from agents import Agent, Runner
import asyncio
agent = Agent(
name="DiagnosticAgent",
instructions="""You are a diagnostic agent. When running commands:
1. Check the exit code first — 0 means success, non-zero means failure
2. If the command fails, read stderr for the error message
3. Diagnose the issue and suggest or attempt a fix
4. Never retry the same failing command more than twice""",
tools=[ShellTool()],
model="gpt-4o",
)
async def main():
result = await Runner.run(
agent,
input="Check if PostgreSQL is running, and if so, list all databases",
)
print(result.final_output)
asyncio.run(main())
The agent will run commands like pg_isready and psql -l, examine the output, and respond with a synthesized answer. If PostgreSQL is not running, it will report the error from stderr rather than guessing.
Security Patterns for Production
Never run ShellTool on the host. Always use a container or remote execution environment. A prompt injection that reaches a host shell can compromise your entire system.
Limit command execution time. Set the container timeout to prevent runaway processes. Commands that hang — like a curl to an unresponsive server — should be killed after a reasonable period.
Log everything. Record every command the agent runs, its output, and the agent's reasoning for running it. This audit trail is essential for debugging, compliance, and detecting misuse.
Use read-only file mounts. If the agent needs access to data files, mount them as read-only volumes. The agent should write outputs to a dedicated writable directory:
container_config = ContainerConfig(
image="python:3.12-slim",
volumes={
"/data/input": {"bind": "/workspace/input", "mode": "ro"},
"/data/output": {"bind": "/workspace/output", "mode": "rw"},
},
)
Combine with approval gates. For sensitive operations, pair the ShellTool with the approval mechanism covered later in this series. The agent proposes a command, a human reviews it, and only approved commands execute.
The ShellTool transforms agents from API callers into full system operators. With proper container isolation, network policies, and skills configuration, you can safely delegate complex system administration and data processing tasks to AI agents while maintaining tight security boundaries.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.