Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

Declarative vs Procedural Memory

Most agent memory systems store facts — what the agent knows. "The user's timezone is PST." "The database uses PostgreSQL." This is declarative memory. But agents also need to remember how to do things. How to deploy a service. How to debug a failing test. How to file a bug report in the team's specific format.

Procedural memory stores sequences of actions that accomplish a task. Once an agent successfully completes a complex procedure, it records the steps so it can replay and refine the procedure next time instead of reasoning from scratch.

Skill Storage

A procedure is a named sequence of steps, each with an action type, parameters, expected outcomes, and timing metadata.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Optional
from enum import Enum

class StepStatus(Enum):
    PENDING = "pending"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"

@dataclass
class ProcedureStep:
    action: str
    parameters: dict[str, Any]
    expected_outcome: str = ""
    actual_outcome: str = ""
    status: StepStatus = StepStatus.PENDING
    duration_ms: float = 0
    error: str = ""
    notes: str = ""

@dataclass
class Procedure:
    name: str
    description: str
    steps: list[ProcedureStep] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    last_executed: Optional[datetime] = None
    execution_count: int = 0
    success_rate: float = 0.0
    avg_duration_ms: float = 0.0
    tags: list[str] = field(default_factory=list)
    version: int = 1

class ProceduralMemory:
    def __init__(self):
        self.procedures: dict[str, Procedure] = {}
        self.execution_log: list[dict] = []

    def store_procedure(
        self,
        name: str,
        description: str,
        steps: list[dict],
        tags: list[str] | None = None,
    ) -> Procedure:
        proc_steps = [
            ProcedureStep(
                action=s["action"],
                parameters=s.get("parameters", {}),
                expected_outcome=s.get("expected_outcome", ""),
            )
            for s in steps
        ]
        proc = Procedure(
            name=name,
            description=description,
            steps=proc_steps,
            tags=tags or [],
        )
        self.procedures[name] = proc
        return proc

Procedure Recording

The most natural way to build procedural memory is recording. As the agent executes a task, it logs each step automatically. After successful completion, the recorded steps become a stored procedure.

class ProcedureRecorder:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.steps: list[ProcedureStep] = []
        self.start_time: datetime | None = None

    def start(self):
        self.start_time = datetime.now()
        self.steps = []

    def record_step(
        self,
        action: str,
        parameters: dict,
        outcome: str = "",
        status: StepStatus = StepStatus.SUCCESS,
        duration_ms: float = 0,
    ):
        step = ProcedureStep(
            action=action,
            parameters=parameters,
            actual_outcome=outcome,
            status=status,
            duration_ms=duration_ms,
        )
        self.steps.append(step)

    def finalize(
        self, memory: ProceduralMemory
    ) -> Procedure | None:
        if not self.steps:
            return None

        successful_steps = [
            ProcedureStep(
                action=s.action,
                parameters=s.parameters,
                expected_outcome=s.actual_outcome,
            )
            for s in self.steps
            if s.status == StepStatus.SUCCESS
        ]

        if not successful_steps:
            return None

        proc = Procedure(
            name=self.name,
            description=self.description,
            steps=successful_steps,
        )
        proc.execution_count = 1
        proc.success_rate = 1.0
        proc.last_executed = datetime.now()
        memory.procedures[self.name] = proc
        return proc

Replay

When the agent encounters a familiar task, it retrieves the stored procedure and replays the steps rather than reasoning from scratch. Each step is executed with the recorded parameters, and outcomes are compared against expectations.

async def replay_procedure(
    self,
    name: str,
    executor,  # callable that takes (action, params) -> outcome
    adapt_params: dict | None = None,
) -> dict:
    proc = self.procedures.get(name)
    if not proc:
        return {"success": False, "error": "Procedure not found"}

    results = []
    all_success = True
    total_ms = 0

    for i, step in enumerate(proc.steps):
        params = dict(step.parameters)
        if adapt_params:
            params.update(adapt_params.get(step.action, {}))

        start = datetime.now()
        try:
            outcome = await executor(step.action, params)
            duration = (datetime.now() - start).total_seconds() * 1000
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "success",
                "outcome": str(outcome),
                "duration_ms": duration,
            })
            total_ms += duration
        except Exception as e:
            all_success = False
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "failed",
                "error": str(e),
            })

    # Update procedure statistics
    proc.execution_count += 1
    proc.last_executed = datetime.now()
    total_runs = proc.execution_count
    if all_success:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1) + 1.0)
            / total_runs
        )
    else:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1))
            / total_runs
        )
    proc.avg_duration_ms = (
        (proc.avg_duration_ms * (total_runs - 1) + total_ms)
        / total_runs
    )

    return {"success": all_success, "steps": results}

Optimization Over Time

Each execution refines the procedure. Steps that consistently fail can be removed or replaced. Steps that are slow can be flagged for optimization. The agent can also merge similar procedures, keeping the most efficient variant.

def find_similar(
    self, description: str, threshold: int = 2
) -> list[Procedure]:
    """Find procedures with overlapping keywords."""
    query_words = set(description.lower().split())
    results = []
    for proc in self.procedures.values():
        proc_words = set(proc.description.lower().split())
        overlap = len(query_words & proc_words)
        if overlap >= threshold:
            results.append(proc)
    results.sort(key=lambda p: p.success_rate, reverse=True)
    return results

def optimize_procedure(self, name: str) -> Procedure | None:
    proc = self.procedures.get(name)
    if not proc or proc.execution_count < 3:
        return None  # Need enough data to optimize

    # Remove steps that fail more than they succeed
    optimized_steps = []
    for step in proc.steps:
        if step.status != StepStatus.FAILED:
            optimized_steps.append(step)

    proc.steps = optimized_steps
    proc.version += 1
    return proc

Practical Example

memory = ProceduralMemory()

# Record a deployment procedure
recorder = ProcedureRecorder(
    "deploy_backend", "Deploy backend service to production"
)
recorder.start()

recorder.record_step(
    "run_tests", {"suite": "all"}, "All 142 tests passed"
)
recorder.record_step(
    "build_image", {"tag": "v1.2.3"}, "Image built successfully"
)
recorder.record_step(
    "push_image", {"registry": "gcr.io/myproject"}, "Pushed"
)
recorder.record_step(
    "apply_k8s", {"manifest": "deploy.yaml"}, "Rollout started"
)
recorder.record_step(
    "verify_health", {"url": "/health"}, "200 OK"
)

recorder.finalize(memory)

# Next time — replay instead of reasoning from scratch
# result = await memory.replay_procedure("deploy_backend", executor)

FAQ

How does procedural memory differ from a simple script?

A script is static — it runs the same steps every time. Procedural memory is adaptive. The agent can modify parameters based on context, skip steps that are not needed, and improve the procedure based on execution history. It is a living script that learns.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

When should an agent create a new procedure vs reuse an existing one?

Use the find_similar method to check for existing procedures before recording a new one. If a similar procedure exists with a high success rate, replay it with adapted parameters. Create a new procedure only when the task is genuinely novel.

Can procedures compose — calling one procedure from within another?

Yes. Treat each procedure as a callable action. A "deploy_full_stack" procedure can include a step whose action is "replay_procedure" with a parameter of "deploy_backend". This creates reusable, composable skill libraries.

#ProceduralMemory #SkillLearning #TaskExecution #Python #AgenticAI #LearnAI #AIEngineering

Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

Declarative vs Procedural Memory

Skill Storage

Procedure Recording

Replay

Optimization Over Time

Practical Example

FAQ

How does procedural memory differ from a simple script?

When should an agent create a new procedure vs reuse an existing one?

Can procedures compose — calling one procedure from within another?

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice

Enterprise CIO Guide: Perplexity Comet — The Agentic Browser Goes Mass Market

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale