Skip to content
Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions
Learn Agentic AI11 min read19 views

Building a Translation Memory for AI Agents: Consistent Terminology Across Interactions

Implement translation memory systems with term glossaries, translation caching, and consistency enforcement to maintain uniform terminology across all AI agent interactions.

The Terminology Consistency Problem

When an AI agent translates "escalation" as "escalacion" in one response and "derivacion" in the next, users lose trust. Inconsistent terminology makes the agent feel unreliable and creates confusion, especially in domain-specific contexts like healthcare, legal, or financial services where precise terms carry regulatory weight.

Translation memory (TM) solves this by storing approved translations of terms and phrases, then enforcing their reuse across all agent interactions. This is a standard practice in the professional translation industry, and it applies directly to AI agents.

Term Glossary Data Model

The foundation of translation memory is a structured glossary that maps source terms to approved translations per language.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
from dataclasses import dataclass, field
from typing import Dict, List, Optional
from datetime import datetime

@dataclass
class GlossaryEntry:
    term_id: str
    source_term: str
    source_lang: str
    translations: Dict[str, str]  # lang_code -> approved translation
    domain: str  # e.g., "medical", "legal", "general"
    context_note: str = ""
    do_not_translate: bool = False  # Brand names, product names
    created_at: str = ""
    updated_at: str = ""

@dataclass
class Glossary:
    entries: List[GlossaryEntry] = field(default_factory=list)
    _index: Dict[str, Dict[str, GlossaryEntry]] = field(default_factory=dict)

    def add_entry(self, entry: GlossaryEntry) -> None:
        self.entries.append(entry)
        # Index by source language and lowercase term
        lang_index = self._index.setdefault(entry.source_lang, {})
        lang_index[entry.source_term.lower()] = entry

    def lookup(self, term: str, source_lang: str = "en") -> Optional[GlossaryEntry]:
        lang_index = self._index.get(source_lang, {})
        return lang_index.get(term.lower())

    def get_translation(self, term: str, target_lang: str, source_lang: str = "en") -> Optional[str]:
        entry = self.lookup(term, source_lang)
        if not entry:
            return None
        if entry.do_not_translate:
            return entry.source_term  # Return as-is
        return entry.translations.get(target_lang)

Translation Cache with Fuzzy Matching

Beyond exact term matches, cache full phrase translations and use fuzzy matching to find similar previously translated segments.

from difflib import SequenceMatcher
from typing import Tuple

@dataclass
class TranslationSegment:
    source_text: str
    source_lang: str
    target_text: str
    target_lang: str
    match_score: float  # 1.0 for exact, lower for fuzzy
    domain: str
    last_used: str
    use_count: int = 0

class TranslationMemoryStore:
    def __init__(self, fuzzy_threshold: float = 0.75):
        self.segments: List[TranslationSegment] = []
        self.fuzzy_threshold = fuzzy_threshold
        self._exact_index: Dict[str, TranslationSegment] = {}

    def add_segment(self, segment: TranslationSegment) -> None:
        key = f"{segment.source_lang}:{segment.target_lang}:{segment.source_text.lower()}"
        self._exact_index[key] = segment
        self.segments.append(segment)

    def find_match(
        self, source: str, source_lang: str, target_lang: str
    ) -> Optional[TranslationSegment]:
        # Try exact match first
        key = f"{source_lang}:{target_lang}:{source.lower()}"
        exact = self._exact_index.get(key)
        if exact:
            exact.use_count += 1
            return exact

        # Fuzzy match
        best_match: Optional[TranslationSegment] = None
        best_score = 0.0
        for seg in self.segments:
            if seg.source_lang != source_lang or seg.target_lang != target_lang:
                continue
            score = SequenceMatcher(None, source.lower(), seg.source_text.lower()).ratio()
            if score > best_score and score >= self.fuzzy_threshold:
                best_score = score
                best_match = seg

        if best_match:
            # Return a copy with adjusted score
            return TranslationSegment(
                source_text=best_match.source_text,
                source_lang=best_match.source_lang,
                target_text=best_match.target_text,
                target_lang=best_match.target_lang,
                match_score=best_score,
                domain=best_match.domain,
                last_used=best_match.last_used,
                use_count=best_match.use_count,
            )
        return None

Consistency Enforcement in Agent Responses

Before sending a response, scan it for terms that have glossary entries and verify they use the approved translation.

import re

class ConsistencyEnforcer:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary

    def check_response(self, response: str, target_lang: str) -> dict:
        """Check response for terminology consistency violations."""
        violations = []
        suggestions = []

        for entry in self.glossary.entries:
            approved = entry.translations.get(target_lang)
            if not approved:
                continue

            # Check if source term appears untranslated
            if entry.source_term.lower() in response.lower() and not entry.do_not_translate:
                violations.append({
                    "term": entry.source_term,
                    "expected": approved,
                    "issue": "source term used instead of translation",
                })

        return {
            "consistent": len(violations) == 0,
            "violations": violations,
            "total_checked": len(self.glossary.entries),
        }

    def enforce(self, response: str, target_lang: str) -> str:
        """Replace inconsistent terminology with approved translations."""
        result = response
        for entry in self.glossary.entries:
            if entry.do_not_translate:
                continue
            approved = entry.translations.get(target_lang)
            if not approved:
                continue
            # Case-insensitive replacement of source terms
            pattern = re.compile(re.escape(entry.source_term), re.IGNORECASE)
            result = pattern.sub(approved, result)
        return result

Glossary-Augmented Translation Prompts

When using an LLM for translation, inject the glossary into the prompt to guide consistent term usage.

class GlossaryAugmentedTranslator:
    def __init__(self, client, glossary: Glossary):
        self.client = client
        self.glossary = glossary

    def _build_glossary_context(self, text: str, target_lang: str) -> str:
        """Extract relevant glossary entries for the text being translated."""
        relevant = []
        for entry in self.glossary.entries:
            if entry.source_term.lower() in text.lower():
                trans = entry.translations.get(target_lang)
                if trans:
                    note = f" ({entry.context_note})" if entry.context_note else ""
                    if entry.do_not_translate:
                        relevant.append(f"- '{entry.source_term}' -> DO NOT TRANSLATE (keep as-is)")
                    else:
                        relevant.append(f"- '{entry.source_term}' -> '{trans}'{note}")
        if not relevant:
            return ""
        return "MANDATORY GLOSSARY (use these exact translations):\n" + "\n".join(relevant)

    async def translate(self, text: str, source_lang: str, target_lang: str) -> str:
        glossary_ctx = self._build_glossary_context(text, target_lang)
        system_msg = f"Translate from {source_lang} to {target_lang}."
        if glossary_ctx:
            system_msg += f"\n\n{glossary_ctx}"
        system_msg += "\nPreserve formatting and code blocks."

        resp = await self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_msg},
                {"role": "user", "content": text},
            ],
            temperature=0.1,
        )
        return resp.choices[0].message.content or ""

Glossary Updates and Versioning

Glossaries evolve as products change. Maintain version history to understand when and why terms were updated.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

@dataclass
class GlossaryChange:
    term_id: str
    field_changed: str
    old_value: str
    new_value: str
    changed_by: str
    changed_at: str
    reason: str

class VersionedGlossary:
    def __init__(self, glossary: Glossary):
        self.glossary = glossary
        self.changelog: List[GlossaryChange] = []

    def update_translation(
        self, term_id: str, target_lang: str, new_translation: str,
        changed_by: str, reason: str
    ) -> None:
        entry = None
        for e in self.glossary.entries:
            if e.term_id == term_id:
                entry = e
                break
        if not entry:
            raise ValueError(f"Term {term_id} not found")

        old_value = entry.translations.get(target_lang, "")
        self.changelog.append(GlossaryChange(
            term_id=term_id,
            field_changed=f"translations.{target_lang}",
            old_value=old_value,
            new_value=new_translation,
            changed_by=changed_by,
            changed_at=datetime.utcnow().isoformat(),
            reason=reason,
        ))
        entry.translations[target_lang] = new_translation
        entry.updated_at = datetime.utcnow().isoformat()

FAQ

How large should my glossary be before it impacts translation quality?

Start with 50-100 high-impact domain terms. Glossaries up to 500 entries work well when injected into LLM translation prompts. Beyond that, filter to only include entries relevant to the specific text being translated (as shown in the _build_glossary_context method) to avoid overwhelming the model's context window.

Should I store the translation memory in a database or in files?

For small-to-medium agents (under 10,000 segments), JSON files versioned in Git work well and keep the translation memory auditable. For larger systems, use a database (PostgreSQL with trigram indexes for fuzzy matching) and expose the TM through an internal API. The key requirement is that translators and developers can both access and update it.

How do I handle terms that have multiple valid translations depending on context?

Add context tags to glossary entries. For example, "account" in a banking context translates differently than "account" in a user authentication context. The consistency enforcer should match on both the term and the context tag. When context is ambiguous, flag the term for human review rather than auto-replacing.


#TranslationMemory #TerminologyManagement #Consistency #AIAgents #Localization #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Agents

Personal AI Assistant: How to Pick One for Business in 2026

A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.

AI Agents

Free AI Agents in 2026: When Free Wins and When It Costs You

A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.

Agentic AI

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

AI Agents

Chatbot App vs ChatGPT: What's the Difference, and Which Do I Need?

Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.

HVAC

Building an HVAC After-Hours Emergency Escalation System: A Complete Engineering Guide

How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.

Voice AI

Multilingual Voice Agents After GPT-Realtime-Translate: The New Landscape

What changed for builders after OpenAI's GPT-Realtime-Translate launch on May 7, 2026. The new multilingual voice stack and who it disrupts.