---
title: "Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval"
description: "Implement and tune multi-signal memory retrieval for AI agents using recency, relevance, and importance scoring functions with combined ranking and parameter tuning strategies."
canonical: https://callsphere.ai/blog/memory-search-strategies-recency-relevance-importance-weighted-retrieval
category: "Learn Agentic AI"
tags: ["Memory Retrieval", "Search Ranking", "Agent Memory", "Python", "Agentic AI"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.539Z
---

# Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval

> Implement and tune multi-signal memory retrieval for AI agents using recency, relevance, and importance scoring functions with combined ranking and parameter tuning strategies.

## The Retrieval Quality Problem

An agent's memory is only as good as its retrieval. Storing a thousand perfectly organized memories means nothing if the agent pulls back the wrong five when answering a question. Most naive implementations use a single signal — either recency (most recent first) or relevance (best embedding match). Both fail in predictable ways.

Recency-only retrieval ignores critical old memories. Relevance-only retrieval surfaces stale facts that matched the query words but are no longer accurate. Production agents need multi-signal ranking that balances recency, relevance, and importance.

## The Three Scoring Functions

Each signal produces a score between 0 and 1 for every memory candidate.

```mermaid
flowchart TD
    MSG(["New message"])
    WORKING["Working memory
rolling window"]
    EPISODIC[("Episodic memory
past sessions")]
    SEMANTIC[("Semantic memory
facts and preferences")]
    SUM["Summarizer
compresses old turns"]
    ROUTER{"Retrieve
needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater
writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
```

### Recency Score

Recency decays exponentially from the memory's last access time. Recent memories score near 1.0, and old memories approach 0.0.

```python
import math
from datetime import datetime
from dataclasses import dataclass, field

@dataclass
class Memory:
    content: str
    embedding: list[float]
    created_at: datetime
    last_accessed: datetime
    importance: float = 0.5
    access_count: int = 0

def recency_score(
    memory: Memory,
    now: datetime,
    half_life_hours: float = 24.0,
) -> float:
    hours_elapsed = (
        (now - memory.last_accessed).total_seconds() / 3600
    )
    decay_rate = math.log(2) / half_life_hours
    return math.exp(-decay_rate * hours_elapsed)
```

The half-life parameter controls the decay speed. A 24-hour half-life means a memory accessed yesterday gets a recency score of 0.5. A 168-hour half-life (one week) gives the same memory a score of about 0.95.

### Relevance Score

Relevance measures how semantically close a memory is to the current query. In production, this is the cosine similarity between the query embedding and the memory embedding.

```python
def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)

def relevance_score(
    memory: Memory,
    query_embedding: list[float],
) -> float:
    sim = cosine_similarity(memory.embedding, query_embedding)
    # Normalize from [-1, 1] to [0, 1]
    return (sim + 1) / 2
```

### Importance Score

Importance is a property of the memory itself, not the query. It reflects how critical this information is regardless of context. User preferences, explicit instructions, and key decisions have high importance. Transient observations have low importance.

```python
def importance_score(memory: Memory) -> float:
    base = memory.importance
    # Boost based on access frequency
    access_boost = min(memory.access_count * 0.02, 0.2)
    return min(base + access_boost, 1.0)
```

## Combined Ranking

The three signals are combined with configurable weights. This lets you tune the retrieval behavior for different use cases.

```python
@dataclass
class RetrievalWeights:
    recency: float = 0.3
    relevance: float = 0.5
    importance: float = 0.2

    def __post_init__(self):
        total = self.recency + self.relevance + self.importance
        self.recency /= total
        self.relevance /= total
        self.importance /= total

def combined_score(
    memory: Memory,
    query_embedding: list[float],
    now: datetime,
    weights: RetrievalWeights,
    half_life_hours: float = 24.0,
) -> float:
    r = recency_score(memory, now, half_life_hours)
    rel = relevance_score(memory, query_embedding)
    imp = importance_score(memory)
    return (
        weights.recency * r
        + weights.relevance * rel
        + weights.importance * imp
    )

def retrieve(
    memories: list[Memory],
    query_embedding: list[float],
    weights: RetrievalWeights | None = None,
    top_k: int = 5,
    half_life_hours: float = 24.0,
) -> list[Memory]:
    weights = weights or RetrievalWeights()
    now = datetime.now()
    scored = [
        (
            combined_score(
                m, query_embedding, now, weights, half_life_hours
            ),
            m,
        )
        for m in memories
    ]
    scored.sort(key=lambda x: x[0], reverse=True)
    results = []
    for _, mem in scored[:top_k]:
        mem.last_accessed = now
        mem.access_count += 1
        results.append(mem)
    return results
```

## Tuning the Weights

Different agent scenarios need different weight profiles.

**Customer support agents** should weight importance heavily (0.4) so that account details and policies always surface. Recency matters moderately (0.3) because recent tickets provide context.

**Research agents** should weight relevance heavily (0.6) since the user is searching for specific knowledge. Recency and importance split the remainder.

**Personal assistants** should weight recency highly (0.4) because users usually ask about recent events. Importance handles persistent preferences.

```python
# Weight profiles for common scenarios
SUPPORT_WEIGHTS = RetrievalWeights(
    recency=0.3, relevance=0.3, importance=0.4
)
RESEARCH_WEIGHTS = RetrievalWeights(
    recency=0.15, relevance=0.6, importance=0.25
)
ASSISTANT_WEIGHTS = RetrievalWeights(
    recency=0.4, relevance=0.35, importance=0.25
)
```

## A/B Testing Your Retrieval

To tune weights empirically, log what the agent retrieves and whether the user's question was answered successfully. Compare retrieval quality across weight configurations.

```python
@dataclass
class RetrievalLog:
    query: str
    weights_used: RetrievalWeights
    retrieved_ids: list[str]
    user_satisfied: bool | None = None

    def to_dict(self) -> dict:
        return {
            "query": self.query,
            "weights": {
                "recency": self.weights_used.recency,
                "relevance": self.weights_used.relevance,
                "importance": self.weights_used.importance,
            },
            "retrieved_count": len(self.retrieved_ids),
            "satisfied": self.user_satisfied,
        }
```

Collect these logs, segment by weight configuration, and compare the satisfaction rate. Shift weights toward configurations that produce higher satisfaction.

## FAQ

### Should the weights be static or adaptive?

Start with static weights tuned per use case. Adaptive weights that shift based on query type add complexity. For example, a question starting with "what did I just say" should boost recency, while "what is our refund policy" should boost importance. Implementing query-type detection is a good optimization once the static baseline works well.

### What if two memories score identically?

Break ties with creation time — newer memories first. In practice, exact ties are rare because the three signals create a high-resolution scoring space. If you see many ties, your embeddings may lack discriminative power.

### How many memories should I retrieve?

Start with 5 and adjust. Too few and the agent misses context. Too many and you waste context window tokens on low-value memories. Monitor context window utilization and reduce top_k if the agent is frequently truncating.

---

#MemoryRetrieval #SearchRanking #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/memory-search-strategies-recency-relevance-importance-weighted-retrieval
