Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions

The Terminology Consistency Problem

When an AI agent translates "escalation" as "escalacion" in one response and "derivacion" in the next, users lose trust. Inconsistent terminology makes the agent feel unreliable and creates confusion, especially in domain-specific contexts like healthcare, legal, or financial services where precise terms carry regulatory weight.

Translation memory (TM) solves this by storing approved translations of terms and phrases, then enforcing their reuse across all agent interactions. This is a standard practice in the professional translation industry, and it applies directly to AI agents.

Term Glossary Data Model

The foundation of translation memory is a structured glossary that maps source terms to approved translations per language.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime

@dataclass
class GlossaryEntry:
    term_id: str
    source_term: str
    source_lang: str
    translations: Dict[str, str]  # lang_code -> approved translation
    domain: str  # e.g., "medical", "legal", "general"
    context_note: str = ""
    do_not_translate: bool = False  # Brand names, product names
    created_at: str = ""
    updated_at: str = ""

@dataclass
class Glossary:
    entries: List[GlossaryEntry] = field(default_factory=list)
    _index: Dict[str, Dict[str, GlossaryEntry]] = field(default_factory=dict)

    def add_entry(self, entry: GlossaryEntry) -> None:
        self.entries.append(entry)
        # Index by source language and lowercase term
        lang_index = self._index.setdefault(entry.source_lang, {})
        lang_index[entry.source_term.lower()] = entry

    def lookup(self, term: str, source_lang: str = "en") -> Optional[GlossaryEntry]:
        lang_index = self._index.get(source_lang, {})
        return lang_index.get(term.lower())

    def get_translation(self, term: str, target_lang: str, source_lang: str = "en") -> Optional[str]:
        entry = self.lookup(term, source_lang)
        if not entry:
            return None
        if entry.do_not_translate:
            return entry.source_term  # Return as-is
        return entry.translations.get(target_lang)

Translation Cache with Fuzzy Matching

Beyond exact term matches, cache full phrase translations and use fuzzy matching to find similar previously translated segments.

from difflib import SequenceMatcher
from typing import Tuple

@dataclass
class TranslationSegment:
    source_text: str
    source_lang: str
    target_text: str
    target_lang: str
    match_score: float  # 1.0 for exact, lower for fuzzy
    domain: str
    last_used: str
    use_count: int = 0

class TranslationMemoryStore:
    def __init__(self, fuzzy_threshold: float = 0.75):
        self.segments: List[TranslationSegment] = []
        self.fuzzy_threshold = fuzzy_threshold
        self._exact_index: Dict[str, TranslationSegment] = {}

    def add_segment(self, segment: TranslationSegment) -> None:
        key = f"{segment.source_lang}:{segment.target_lang}:{segment.source_text.lower()}"
        self._exact_index[key] = segment
        self.segments.append(segment)

    def find_match(
        self, source: str, source_lang: str, target_lang: str
    ) -> Optional[TranslationSegment]:
        # Try exact match first
        key = f"{source_lang}:{target_lang}:{source.lower()}"
        exact = self._exact_index.get(key)
        if exact:
            exact.use_count += 1
            return exact

        # Fuzzy match
        best_match: Optional[TranslationSegment] = None
        best_score = 0.0
        for seg in self.segments:
            if seg.source_lang != source_lang or seg.target_lang != target_lang:
                continue
            score = SequenceMatcher(None, source.lower(), seg.source_text.lower()).ratio()
            if score > best_score and score >= self.fuzzy_threshold:
                best_score = score
                best_match = seg

        if best_match:
            # Return a copy with adjusted score
            return TranslationSegment(
                source_text=best_match.source_text,
                source_lang=best_match.source_lang,
                target_text=best_match.target_text,
                target_lang=best_match.target_lang,
                match_score=best_score,
                domain=best_match.domain,
                last_used=best_match.last_used,
                use_count=best_match.use_count,
            )
        return None

Consistency Enforcement in Agent Responses

Before sending a response, scan it for terms that have glossary entries and verify they use the approved translation.

import re

class ConsistencyEnforcer:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary

    def check_response(self, response: str, target_lang: str) -> dict:
        """Check response for terminology consistency violations."""
        violations = []
        suggestions = []

        for entry in self.glossary.entries:
            approved = entry.translations.get(target_lang)
            if not approved:
                continue

            # Check if source term appears untranslated
            if entry.source_term.lower() in response.lower() and not entry.do_not_translate:
                violations.append({
                    "term": entry.source_term,
                    "expected": approved,
                    "issue": "source term used instead of translation",
                })

        return {
            "consistent": len(violations) == 0,
            "violations": violations,
            "total_checked": len(self.glossary.entries),
        }

    def enforce(self, response: str, target_lang: str) -> str:
        """Replace inconsistent terminology with approved translations."""
        result = response
        for entry in self.glossary.entries:
            if entry.do_not_translate:
                continue
            approved = entry.translations.get(target_lang)
            if not approved:
                continue
            # Case-insensitive replacement of source terms
            pattern = re.compile(re.escape(entry.source_term), re.IGNORECASE)
            result = pattern.sub(approved, result)
        return result

Glossary-Augmented Translation Prompts

When using an LLM for translation, inject the glossary into the prompt to guide consistent term usage.

class GlossaryAugmentedTranslator:
    def __init__(self, client, glossary: Glossary):
        self.client = client
        self.glossary = glossary

    def _build_glossary_context(self, text: str, target_lang: str) -> str:
        """Extract relevant glossary entries for the text being translated."""
        relevant = []
        for entry in self.glossary.entries:
            if entry.source_term.lower() in text.lower():
                trans = entry.translations.get(target_lang)
                if trans:
                    note = f" ({entry.context_note})" if entry.context_note else ""
                    if entry.do_not_translate:
                        relevant.append(f"- '{entry.source_term}' -> DO NOT TRANSLATE (keep as-is)")
                    else:
                        relevant.append(f"- '{entry.source_term}' -> '{trans}'{note}")
        if not relevant:
            return ""
        return "MANDATORY GLOSSARY (use these exact translations):\n" + "\n".join(relevant)

    async def translate(self, text: str, source_lang: str, target_lang: str) -> str:
        glossary_ctx = self._build_glossary_context(text, target_lang)
        system_msg = f"Translate from {source_lang} to {target_lang}."
        if glossary_ctx:
            system_msg += f"\n\n{glossary_ctx}"
        system_msg += "\nPreserve formatting and code blocks."

        resp = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": text},
            ],
            temperature=0.1,
        )
        return resp.choices[0].message.content or ""

Glossary Updates and Versioning

Glossaries evolve as products change. Maintain version history to understand when and why terms were updated.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

@dataclass
class GlossaryChange:
    term_id: str
    field_changed: str
    old_value: str
    new_value: str
    changed_by: str
    changed_at: str
    reason: str

class VersionedGlossary:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary
        self.changelog: List[GlossaryChange] = []

    def update_translation(
        self, term_id: str, target_lang: str, new_translation: str,
        changed_by: str, reason: str
    ) -> None:
        entry = None
        for e in self.glossary.entries:
            if e.term_id == term_id:
                entry = e
                break
        if not entry:
            raise ValueError(f"Term {term_id} not found")

        old_value = entry.translations.get(target_lang, "")
        self.changelog.append(GlossaryChange(
            term_id=term_id,
            field_changed=f"translations.{target_lang}",
            old_value=old_value,
            new_value=new_translation,
            changed_by=changed_by,
            changed_at=datetime.utcnow().isoformat(),
            reason=reason,
        ))
        entry.translations[target_lang] = new_translation
        entry.updated_at = datetime.utcnow().isoformat()

FAQ

How large should my glossary be before it impacts translation quality?

Start with 50-100 high-impact domain terms. Glossaries up to 500 entries work well when injected into LLM translation prompts. Beyond that, filter to only include entries relevant to the specific text being translated (as shown in the _build_glossary_context method) to avoid overwhelming the model's context window.

Should I store the translation memory in a database or in files?

For small-to-medium agents (under 10,000 segments), JSON files versioned in Git work well and keep the translation memory auditable. For larger systems, use a database (PostgreSQL with trigram indexes for fuzzy matching) and expose the TM through an internal API. The key requirement is that translators and developers can both access and update it.

How do I handle terms that have multiple valid translations depending on context?

Add context tags to glossary entries. For example, "account" in a banking context translates differently than "account" in a user authentication context. The consistency enforcer should match on both the term and the context tag. When context is ambiguous, flag the term for human review rather than auto-replacing.

#TranslationMemory #TerminologyManagement #Consistency #AIAgents #Localization #AgenticAI #LearnAI #AIEngineering

Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions

The Terminology Consistency Problem

Term Glossary Data Model

Translation Cache with Fuzzy Matching

Consistency Enforcement in Agent Responses

Glossary-Augmented Translation Prompts

Glossary Updates and Versioning

FAQ

How large should my glossary be before it impacts translation quality?

Should I store the translation memory in a database or in files?

How do I handle terms that have multiple valid translations depending on context?

Try CallSphere AI Voice Agents

Related Articles You May Like

Personal AI Assistant: How to Pick One for Business in 2026

Free AI Agents in 2026: When Free Wins and When It Costs You

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

Multilingual Voice Agents After GPT-Realtime-Translate: The New Landscape