---
title: "Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks"
description: "Build procedural memory systems that let AI agents record, store, replay, and optimize multi-step task procedures, enabling skill learning and execution improvement over time."
canonical: https://callsphere.ai/blog/procedural-memory-ai-agents-learning-remembering-task-execution
category: "Learn Agentic AI"
tags: ["Procedural Memory", "Skill Learning", "Task Execution", "Python", "Agentic AI"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T20:24:14.735Z
---

# Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

> Build procedural memory systems that let AI agents record, store, replay, and optimize multi-step task procedures, enabling skill learning and execution improvement over time.

## Declarative vs Procedural Memory

Most agent memory systems store facts — what the agent knows. "The user's timezone is PST." "The database uses PostgreSQL." This is declarative memory. But agents also need to remember how to do things. How to deploy a service. How to debug a failing test. How to file a bug report in the team's specific format.

Procedural memory stores sequences of actions that accomplish a task. Once an agent successfully completes a complex procedure, it records the steps so it can replay and refine the procedure next time instead of reasoning from scratch.

## Skill Storage

A procedure is a named sequence of steps, each with an action type, parameters, expected outcomes, and timing metadata.

```mermaid
flowchart TD
    MSG(["New message"])
    WORKING["Working memory
rolling window"]
    EPISODIC[("Episodic memory
past sessions")]
    SEMANTIC[("Semantic memory
facts and preferences")]
    SUM["Summarizer
compresses old turns"]
    ROUTER{"Retrieve
needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater
writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
```

```python
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Optional
from enum import Enum

class StepStatus(Enum):
    PENDING = "pending"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"

@dataclass
class ProcedureStep:
    action: str
    parameters: dict[str, Any]
    expected_outcome: str = ""
    actual_outcome: str = ""
    status: StepStatus = StepStatus.PENDING
    duration_ms: float = 0
    error: str = ""
    notes: str = ""

@dataclass
class Procedure:
    name: str
    description: str
    steps: list[ProcedureStep] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    last_executed: Optional[datetime] = None
    execution_count: int = 0
    success_rate: float = 0.0
    avg_duration_ms: float = 0.0
    tags: list[str] = field(default_factory=list)
    version: int = 1

class ProceduralMemory:
    def __init__(self):
        self.procedures: dict[str, Procedure] = {}
        self.execution_log: list[dict] = []

    def store_procedure(
        self,
        name: str,
        description: str,
        steps: list[dict],
        tags: list[str] | None = None,
    ) -> Procedure:
        proc_steps = [
            ProcedureStep(
                action=s["action"],
                parameters=s.get("parameters", {}),
                expected_outcome=s.get("expected_outcome", ""),
            )
            for s in steps
        ]
        proc = Procedure(
            name=name,
            description=description,
            steps=proc_steps,
            tags=tags or [],
        )
        self.procedures[name] = proc
        return proc
```

## Procedure Recording

The most natural way to build procedural memory is recording. As the agent executes a task, it logs each step automatically. After successful completion, the recorded steps become a stored procedure.

```python
class ProcedureRecorder:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.steps: list[ProcedureStep] = []
        self.start_time: datetime | None = None

    def start(self):
        self.start_time = datetime.now()
        self.steps = []

    def record_step(
        self,
        action: str,
        parameters: dict,
        outcome: str = "",
        status: StepStatus = StepStatus.SUCCESS,
        duration_ms: float = 0,
    ):
        step = ProcedureStep(
            action=action,
            parameters=parameters,
            actual_outcome=outcome,
            status=status,
            duration_ms=duration_ms,
        )
        self.steps.append(step)

    def finalize(
        self, memory: ProceduralMemory
    ) -> Procedure | None:
        if not self.steps:
            return None

        successful_steps = [
            ProcedureStep(
                action=s.action,
                parameters=s.parameters,
                expected_outcome=s.actual_outcome,
            )
            for s in self.steps
            if s.status == StepStatus.SUCCESS
        ]

        if not successful_steps:
            return None

        proc = Procedure(
            name=self.name,
            description=self.description,
            steps=successful_steps,
        )
        proc.execution_count = 1
        proc.success_rate = 1.0
        proc.last_executed = datetime.now()
        memory.procedures[self.name] = proc
        return proc
```

## Replay

When the agent encounters a familiar task, it retrieves the stored procedure and replays the steps rather than reasoning from scratch. Each step is executed with the recorded parameters, and outcomes are compared against expectations.

```python
async def replay_procedure(
    self,
    name: str,
    executor,  # callable that takes (action, params) -> outcome
    adapt_params: dict | None = None,
) -> dict:
    proc = self.procedures.get(name)
    if not proc:
        return {"success": False, "error": "Procedure not found"}

    results = []
    all_success = True
    total_ms = 0

    for i, step in enumerate(proc.steps):
        params = dict(step.parameters)
        if adapt_params:
            params.update(adapt_params.get(step.action, {}))

        start = datetime.now()
        try:
            outcome = await executor(step.action, params)
            duration = (datetime.now() - start).total_seconds() * 1000
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "success",
                "outcome": str(outcome),
                "duration_ms": duration,
            })
            total_ms += duration
        except Exception as e:
            all_success = False
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "failed",
                "error": str(e),
            })

    # Update procedure statistics
    proc.execution_count += 1
    proc.last_executed = datetime.now()
    total_runs = proc.execution_count
    if all_success:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1) + 1.0)
            / total_runs
        )
    else:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1))
            / total_runs
        )
    proc.avg_duration_ms = (
        (proc.avg_duration_ms * (total_runs - 1) + total_ms)
        / total_runs
    )

    return {"success": all_success, "steps": results}
```

## Optimization Over Time

Each execution refines the procedure. Steps that consistently fail can be removed or replaced. Steps that are slow can be flagged for optimization. The agent can also merge similar procedures, keeping the most efficient variant.

```python
def find_similar(
    self, description: str, threshold: int = 2
) -> list[Procedure]:
    """Find procedures with overlapping keywords."""
    query_words = set(description.lower().split())
    results = []
    for proc in self.procedures.values():
        proc_words = set(proc.description.lower().split())
        overlap = len(query_words & proc_words)
        if overlap >= threshold:
            results.append(proc)
    results.sort(key=lambda p: p.success_rate, reverse=True)
    return results

def optimize_procedure(self, name: str) -> Procedure | None:
    proc = self.procedures.get(name)
    if not proc or proc.execution_count < 3:
        return None  # Need enough data to optimize

    # Remove steps that fail more than they succeed
    optimized_steps = []
    for step in proc.steps:
        if step.status != StepStatus.FAILED:
            optimized_steps.append(step)

    proc.steps = optimized_steps
    proc.version += 1
    return proc
```

## Practical Example

```python
memory = ProceduralMemory()

# Record a deployment procedure
recorder = ProcedureRecorder(
    "deploy_backend", "Deploy backend service to production"
)
recorder.start()

recorder.record_step(
    "run_tests", {"suite": "all"}, "All 142 tests passed"
)
recorder.record_step(
    "build_image", {"tag": "v1.2.3"}, "Image built successfully"
)
recorder.record_step(
    "push_image", {"registry": "gcr.io/myproject"}, "Pushed"
)
recorder.record_step(
    "apply_k8s", {"manifest": "deploy.yaml"}, "Rollout started"
)
recorder.record_step(
    "verify_health", {"url": "/health"}, "200 OK"
)

recorder.finalize(memory)

# Next time — replay instead of reasoning from scratch
# result = await memory.replay_procedure("deploy_backend", executor)
```

## FAQ

### How does procedural memory differ from a simple script?

A script is static — it runs the same steps every time. Procedural memory is adaptive. The agent can modify parameters based on context, skip steps that are not needed, and improve the procedure based on execution history. It is a living script that learns.

### When should an agent create a new procedure vs reuse an existing one?

Use the `find_similar` method to check for existing procedures before recording a new one. If a similar procedure exists with a high success rate, replay it with adapted parameters. Create a new procedure only when the task is genuinely novel.

### Can procedures compose — calling one procedure from within another?

Yes. Treat each procedure as a callable action. A "deploy_full_stack" procedure can include a step whose action is "replay_procedure" with a parameter of "deploy_backend". This creates reusable, composable skill libraries.

---

#ProceduralMemory #SkillLearning #TaskExecution #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/procedural-memory-ai-agents-learning-remembering-task-execution
