Skip to content
Learn Agentic AI
Learn Agentic AI10 min read0 views

Procedural Memory for AI Agents: Learning and Remembering How to Execute Tasks

Build procedural memory systems that let AI agents record, store, replay, and optimize multi-step task procedures, enabling skill learning and execution improvement over time.

Declarative vs Procedural Memory

Most agent memory systems store facts — what the agent knows. "The user's timezone is PST." "The database uses PostgreSQL." This is declarative memory. But agents also need to remember how to do things. How to deploy a service. How to debug a failing test. How to file a bug report in the team's specific format.

Procedural memory stores sequences of actions that accomplish a task. Once an agent successfully completes a complex procedure, it records the steps so it can replay and refine the procedure next time instead of reasoning from scratch.

Skill Storage

A procedure is a named sequence of steps, each with an action type, parameters, expected outcomes, and timing metadata.

flowchart TD
    START["Procedural Memory for AI Agents: Learning and Rem…"] --> A
    A["Declarative vs Procedural Memory"]
    A --> B
    B["Skill Storage"]
    B --> C
    C["Procedure Recording"]
    C --> D
    D["Replay"]
    D --> E
    E["Optimization Over Time"]
    E --> F
    F["Practical Example"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Optional
from enum import Enum


class StepStatus(Enum):
    PENDING = "pending"
    SUCCESS = "success"
    FAILED = "failed"
    SKIPPED = "skipped"


@dataclass
class ProcedureStep:
    action: str
    parameters: dict[str, Any]
    expected_outcome: str = ""
    actual_outcome: str = ""
    status: StepStatus = StepStatus.PENDING
    duration_ms: float = 0
    error: str = ""
    notes: str = ""


@dataclass
class Procedure:
    name: str
    description: str
    steps: list[ProcedureStep] = field(default_factory=list)
    created_at: datetime = field(default_factory=datetime.now)
    last_executed: Optional[datetime] = None
    execution_count: int = 0
    success_rate: float = 0.0
    avg_duration_ms: float = 0.0
    tags: list[str] = field(default_factory=list)
    version: int = 1


class ProceduralMemory:
    def __init__(self):
        self.procedures: dict[str, Procedure] = {}
        self.execution_log: list[dict] = []

    def store_procedure(
        self,
        name: str,
        description: str,
        steps: list[dict],
        tags: list[str] | None = None,
    ) -> Procedure:
        proc_steps = [
            ProcedureStep(
                action=s["action"],
                parameters=s.get("parameters", {}),
                expected_outcome=s.get("expected_outcome", ""),
            )
            for s in steps
        ]
        proc = Procedure(
            name=name,
            description=description,
            steps=proc_steps,
            tags=tags or [],
        )
        self.procedures[name] = proc
        return proc

Procedure Recording

The most natural way to build procedural memory is recording. As the agent executes a task, it logs each step automatically. After successful completion, the recorded steps become a stored procedure.

class ProcedureRecorder:
    def __init__(self, name: str, description: str):
        self.name = name
        self.description = description
        self.steps: list[ProcedureStep] = []
        self.start_time: datetime | None = None

    def start(self):
        self.start_time = datetime.now()
        self.steps = []

    def record_step(
        self,
        action: str,
        parameters: dict,
        outcome: str = "",
        status: StepStatus = StepStatus.SUCCESS,
        duration_ms: float = 0,
    ):
        step = ProcedureStep(
            action=action,
            parameters=parameters,
            actual_outcome=outcome,
            status=status,
            duration_ms=duration_ms,
        )
        self.steps.append(step)

    def finalize(
        self, memory: ProceduralMemory
    ) -> Procedure | None:
        if not self.steps:
            return None

        successful_steps = [
            ProcedureStep(
                action=s.action,
                parameters=s.parameters,
                expected_outcome=s.actual_outcome,
            )
            for s in self.steps
            if s.status == StepStatus.SUCCESS
        ]

        if not successful_steps:
            return None

        proc = Procedure(
            name=self.name,
            description=self.description,
            steps=successful_steps,
        )
        proc.execution_count = 1
        proc.success_rate = 1.0
        proc.last_executed = datetime.now()
        memory.procedures[self.name] = proc
        return proc

Replay

When the agent encounters a familiar task, it retrieves the stored procedure and replays the steps rather than reasoning from scratch. Each step is executed with the recorded parameters, and outcomes are compared against expectations.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

async def replay_procedure(
    self,
    name: str,
    executor,  # callable that takes (action, params) -> outcome
    adapt_params: dict | None = None,
) -> dict:
    proc = self.procedures.get(name)
    if not proc:
        return {"success": False, "error": "Procedure not found"}

    results = []
    all_success = True
    total_ms = 0

    for i, step in enumerate(proc.steps):
        params = dict(step.parameters)
        if adapt_params:
            params.update(adapt_params.get(step.action, {}))

        start = datetime.now()
        try:
            outcome = await executor(step.action, params)
            duration = (datetime.now() - start).total_seconds() * 1000
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "success",
                "outcome": str(outcome),
                "duration_ms": duration,
            })
            total_ms += duration
        except Exception as e:
            all_success = False
            results.append({
                "step": i + 1,
                "action": step.action,
                "status": "failed",
                "error": str(e),
            })

    # Update procedure statistics
    proc.execution_count += 1
    proc.last_executed = datetime.now()
    total_runs = proc.execution_count
    if all_success:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1) + 1.0)
            / total_runs
        )
    else:
        proc.success_rate = (
            (proc.success_rate * (total_runs - 1))
            / total_runs
        )
    proc.avg_duration_ms = (
        (proc.avg_duration_ms * (total_runs - 1) + total_ms)
        / total_runs
    )

    return {"success": all_success, "steps": results}

Optimization Over Time

Each execution refines the procedure. Steps that consistently fail can be removed or replaced. Steps that are slow can be flagged for optimization. The agent can also merge similar procedures, keeping the most efficient variant.

def find_similar(
    self, description: str, threshold: int = 2
) -> list[Procedure]:
    """Find procedures with overlapping keywords."""
    query_words = set(description.lower().split())
    results = []
    for proc in self.procedures.values():
        proc_words = set(proc.description.lower().split())
        overlap = len(query_words & proc_words)
        if overlap >= threshold:
            results.append(proc)
    results.sort(key=lambda p: p.success_rate, reverse=True)
    return results


def optimize_procedure(self, name: str) -> Procedure | None:
    proc = self.procedures.get(name)
    if not proc or proc.execution_count < 3:
        return None  # Need enough data to optimize

    # Remove steps that fail more than they succeed
    optimized_steps = []
    for step in proc.steps:
        if step.status != StepStatus.FAILED:
            optimized_steps.append(step)

    proc.steps = optimized_steps
    proc.version += 1
    return proc

Practical Example

memory = ProceduralMemory()

# Record a deployment procedure
recorder = ProcedureRecorder(
    "deploy_backend", "Deploy backend service to production"
)
recorder.start()

recorder.record_step(
    "run_tests", {"suite": "all"}, "All 142 tests passed"
)
recorder.record_step(
    "build_image", {"tag": "v1.2.3"}, "Image built successfully"
)
recorder.record_step(
    "push_image", {"registry": "gcr.io/myproject"}, "Pushed"
)
recorder.record_step(
    "apply_k8s", {"manifest": "deploy.yaml"}, "Rollout started"
)
recorder.record_step(
    "verify_health", {"url": "/health"}, "200 OK"
)

recorder.finalize(memory)

# Next time — replay instead of reasoning from scratch
# result = await memory.replay_procedure("deploy_backend", executor)

FAQ

How does procedural memory differ from a simple script?

A script is static — it runs the same steps every time. Procedural memory is adaptive. The agent can modify parameters based on context, skip steps that are not needed, and improve the procedure based on execution history. It is a living script that learns.

When should an agent create a new procedure vs reuse an existing one?

Use the find_similar method to check for existing procedures before recording a new one. If a similar procedure exists with a high success rate, replay it with adapted parameters. Create a new procedure only when the task is genuinely novel.

Can procedures compose — calling one procedure from within another?

Yes. Treat each procedure as a callable action. A "deploy_full_stack" procedure can include a step whose action is "replay_procedure" with a parameter of "deploy_backend". This creates reusable, composable skill libraries.


#ProceduralMemory #SkillLearning #TaskExecution #Python #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

Fine-Tuning LLMs for Agentic Tasks: When and How to Customize Foundation Models

When fine-tuning beats prompting for AI agents: dataset creation from agent traces, SFT and DPO training approaches, evaluation methodology, and cost-benefit analysis for agentic fine-tuning.

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.

Learn Agentic AI

Adaptive Thinking in Claude 4.6: How AI Agents Decide When and How Much to Reason

Technical exploration of adaptive thinking in Claude 4.6 — how the model dynamically adjusts reasoning depth, its impact on agent architectures, and practical implementation patterns.

Learn Agentic AI

How NVIDIA Vera CPU Solves the Agentic AI Bottleneck: Architecture Deep Dive

Technical analysis of NVIDIA's Vera CPU designed for agentic AI workloads — why the CPU is the bottleneck, how Vera's architecture addresses it, and what it means for agent performance.