Skip to content
Learn Agentic AI
Learn Agentic AI9 min read0 views

Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval

Implement and tune multi-signal memory retrieval for AI agents using recency, relevance, and importance scoring functions with combined ranking and parameter tuning strategies.

The Retrieval Quality Problem

An agent's memory is only as good as its retrieval. Storing a thousand perfectly organized memories means nothing if the agent pulls back the wrong five when answering a question. Most naive implementations use a single signal — either recency (most recent first) or relevance (best embedding match). Both fail in predictable ways.

Recency-only retrieval ignores critical old memories. Relevance-only retrieval surfaces stale facts that matched the query words but are no longer accurate. Production agents need multi-signal ranking that balances recency, relevance, and importance.

The Three Scoring Functions

Each signal produces a score between 0 and 1 for every memory candidate.

flowchart TD
    START["Memory Search Strategies: Recency, Relevance, and…"] --> A
    A["The Retrieval Quality Problem"]
    A --> B
    B["The Three Scoring Functions"]
    B --> C
    C["Combined Ranking"]
    C --> D
    D["Tuning the Weights"]
    D --> E
    E["A/B Testing Your Retrieval"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

Recency Score

Recency decays exponentially from the memory's last access time. Recent memories score near 1.0, and old memories approach 0.0.

import math
from datetime import datetime
from dataclasses import dataclass, field


@dataclass
class Memory:
    content: str
    embedding: list[float]
    created_at: datetime
    last_accessed: datetime
    importance: float = 0.5
    access_count: int = 0


def recency_score(
    memory: Memory,
    now: datetime,
    half_life_hours: float = 24.0,
) -> float:
    hours_elapsed = (
        (now - memory.last_accessed).total_seconds() / 3600
    )
    decay_rate = math.log(2) / half_life_hours
    return math.exp(-decay_rate * hours_elapsed)

The half-life parameter controls the decay speed. A 24-hour half-life means a memory accessed yesterday gets a recency score of 0.5. A 168-hour half-life (one week) gives the same memory a score of about 0.95.

Relevance Score

Relevance measures how semantically close a memory is to the current query. In production, this is the cosine similarity between the query embedding and the memory embedding.

def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)


def relevance_score(
    memory: Memory,
    query_embedding: list[float],
) -> float:
    sim = cosine_similarity(memory.embedding, query_embedding)
    # Normalize from [-1, 1] to [0, 1]
    return (sim + 1) / 2

Importance Score

Importance is a property of the memory itself, not the query. It reflects how critical this information is regardless of context. User preferences, explicit instructions, and key decisions have high importance. Transient observations have low importance.

def importance_score(memory: Memory) -> float:
    base = memory.importance
    # Boost based on access frequency
    access_boost = min(memory.access_count * 0.02, 0.2)
    return min(base + access_boost, 1.0)

Combined Ranking

The three signals are combined with configurable weights. This lets you tune the retrieval behavior for different use cases.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

flowchart TD
    ROOT["Memory Search Strategies: Recency, Relevance…"] 
    ROOT --> P0["The Three Scoring Functions"]
    P0 --> P0C0["Recency Score"]
    P0 --> P0C1["Relevance Score"]
    P0 --> P0C2["Importance Score"]
    ROOT --> P1["FAQ"]
    P1 --> P1C0["Should the weights be static or adaptiv…"]
    P1 --> P1C1["What if two memories score identically?"]
    P1 --> P1C2["How many memories should I retrieve?"]
    style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
    style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
@dataclass
class RetrievalWeights:
    recency: float = 0.3
    relevance: float = 0.5
    importance: float = 0.2

    def __post_init__(self):
        total = self.recency + self.relevance + self.importance
        self.recency /= total
        self.relevance /= total
        self.importance /= total


def combined_score(
    memory: Memory,
    query_embedding: list[float],
    now: datetime,
    weights: RetrievalWeights,
    half_life_hours: float = 24.0,
) -> float:
    r = recency_score(memory, now, half_life_hours)
    rel = relevance_score(memory, query_embedding)
    imp = importance_score(memory)
    return (
        weights.recency * r
        + weights.relevance * rel
        + weights.importance * imp
    )


def retrieve(
    memories: list[Memory],
    query_embedding: list[float],
    weights: RetrievalWeights | None = None,
    top_k: int = 5,
    half_life_hours: float = 24.0,
) -> list[Memory]:
    weights = weights or RetrievalWeights()
    now = datetime.now()
    scored = [
        (
            combined_score(
                m, query_embedding, now, weights, half_life_hours
            ),
            m,
        )
        for m in memories
    ]
    scored.sort(key=lambda x: x[0], reverse=True)
    results = []
    for _, mem in scored[:top_k]:
        mem.last_accessed = now
        mem.access_count += 1
        results.append(mem)
    return results

Tuning the Weights

Different agent scenarios need different weight profiles.

Customer support agents should weight importance heavily (0.4) so that account details and policies always surface. Recency matters moderately (0.3) because recent tickets provide context.

Research agents should weight relevance heavily (0.6) since the user is searching for specific knowledge. Recency and importance split the remainder.

Personal assistants should weight recency highly (0.4) because users usually ask about recent events. Importance handles persistent preferences.

# Weight profiles for common scenarios
SUPPORT_WEIGHTS = RetrievalWeights(
    recency=0.3, relevance=0.3, importance=0.4
)
RESEARCH_WEIGHTS = RetrievalWeights(
    recency=0.15, relevance=0.6, importance=0.25
)
ASSISTANT_WEIGHTS = RetrievalWeights(
    recency=0.4, relevance=0.35, importance=0.25
)

A/B Testing Your Retrieval

To tune weights empirically, log what the agent retrieves and whether the user's question was answered successfully. Compare retrieval quality across weight configurations.

@dataclass
class RetrievalLog:
    query: str
    weights_used: RetrievalWeights
    retrieved_ids: list[str]
    user_satisfied: bool | None = None

    def to_dict(self) -> dict:
        return {
            "query": self.query,
            "weights": {
                "recency": self.weights_used.recency,
                "relevance": self.weights_used.relevance,
                "importance": self.weights_used.importance,
            },
            "retrieved_count": len(self.retrieved_ids),
            "satisfied": self.user_satisfied,
        }

Collect these logs, segment by weight configuration, and compare the satisfaction rate. Shift weights toward configurations that produce higher satisfaction.

FAQ

Should the weights be static or adaptive?

Start with static weights tuned per use case. Adaptive weights that shift based on query type add complexity. For example, a question starting with "what did I just say" should boost recency, while "what is our refund policy" should boost importance. Implementing query-type detection is a good optimization once the static baseline works well.

What if two memories score identically?

Break ties with creation time — newer memories first. In practice, exact ties are rare because the three signals create a high-resolution scoring space. If you see many ties, your embeddings may lack discriminative power.

How many memories should I retrieve?

Start with 5 and adjust. Too few and the agent misses context. Too many and you waste context window tokens on low-value memories. Monitor context window utilization and reduce top_k if the agent is frequently truncating.


#MemoryRetrieval #SearchRanking #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

Fine-Tuning LLMs for Agentic Tasks: When and How to Customize Foundation Models

When fine-tuning beats prompting for AI agents: dataset creation from agent traces, SFT and DPO training approaches, evaluation methodology, and cost-benefit analysis for agentic fine-tuning.

AI Interview Prep

7 Agentic AI & Multi-Agent System Interview Questions for 2026

Real agentic AI and multi-agent system interview questions from Anthropic, OpenAI, and Microsoft in 2026. Covers agent design patterns, memory systems, safety, orchestration frameworks, tool calling, and evaluation.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.

Learn Agentic AI

Adaptive Thinking in Claude 4.6: How AI Agents Decide When and How Much to Reason

Technical exploration of adaptive thinking in Claude 4.6 — how the model dynamically adjusts reasoning depth, its impact on agent architectures, and practical implementation patterns.

Learn Agentic AI

How NVIDIA Vera CPU Solves the Agentic AI Bottleneck: Architecture Deep Dive

Technical analysis of NVIDIA's Vera CPU designed for agentic AI workloads — why the CPU is the bottleneck, how Vera's architecture addresses it, and what it means for agent performance.