---
title: "Service Discovery for AI Agent Microservices: Consul, Kubernetes DNS, and Eureka"
description: "Implement service discovery for AI agent microservices using Kubernetes DNS, Consul, and Eureka. Learn health checking, load balancing, and failover strategies that keep agent systems resilient."
canonical: https://callsphere.ai/blog/service-discovery-ai-agent-microservices-consul-kubernetes
category: "Learn Agentic AI"
tags: ["Service Discovery", "Kubernetes", "Consul", "Microservices", "Agentic AI"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.219Z
---

# Service Discovery for AI Agent Microservices: Consul, Kubernetes DNS, and Eureka

> Implement service discovery for AI agent microservices using Kubernetes DNS, Consul, and Eureka. Learn health checking, load balancing, and failover strategies that keep agent systems resilient.

## The Service Discovery Problem in Agent Systems

In a monolithic agent, every component is reachable through a function call. When you decompose into microservices, the conversation manager needs to find the RAG service, the tool execution engine, and the memory store. These services may have multiple replicas, they may restart and get new IP addresses, and new instances may spin up during load spikes.

Hardcoding IP addresses or hostnames in configuration files breaks the moment a pod restarts. Service discovery is the mechanism that lets services find each other dynamically.

## Kubernetes DNS: The Zero-Config Option

If your agent system runs on Kubernetes, you get service discovery out of the box. Every Kubernetes Service object creates a DNS entry that other pods can resolve:

```mermaid
flowchart LR
    GIT(["Git push"])
    CI["GitHub Actions
build plus test"]
    REG[("Container registry
GHCR or ECR")]
    HELM["Helm chart
values per env"]
    K8S{"Kubernetes cluster"}
    DEP["Deployment
rolling update"]
    SVC["Service plus Ingress"]
    HPA["HPA
CPU and queue depth"]
    POD[("Inference pods
GPU node pool")]
    USERS(["Production traffic"])
    GIT --> CI --> REG --> HELM --> K8S
    K8S --> DEP --> POD
    K8S --> SVC --> POD
    K8S --> HPA --> POD
    SVC --> USERS
    style CI fill:#4f46e5,stroke:#4338ca,color:#fff
    style POD fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style USERS fill:#059669,stroke:#047857,color:#fff
```

```yaml
# rag-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: rag-retrieval
  namespace: agent-system
spec:
  selector:
    app: rag-retrieval
  ports:
    - port: 8002
      targetPort: 8002
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-retrieval
  namespace: agent-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-retrieval
  template:
    metadata:
      labels:
        app: rag-retrieval
    spec:
      containers:
        - name: app
          image: agent-system/rag-retrieval:v2.1
          ports:
            - containerPort: 8002
          readinessProbe:
            httpGet:
              path: /health
              port: 8002
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8002
            initialDelaySeconds: 15
            periodSeconds: 20
```

Any pod in the `agent-system` namespace can reach the RAG service at `http://rag-retrieval:8002`. Kubernetes automatically load-balances across the 3 replicas. The readiness probe ensures that traffic only reaches pods that are actually ready to serve requests.

In the conversation manager's configuration, the service URL is simply a Kubernetes DNS name:

```python
import os

class ServiceConfig:
    RAG_SERVICE_URL = os.getenv(
        "RAG_SERVICE_URL", "http://rag-retrieval:8002"
    )
    TOOL_SERVICE_URL = os.getenv(
        "TOOL_SERVICE_URL", "http://tool-execution:8001"
    )
    MEMORY_SERVICE_URL = os.getenv(
        "MEMORY_SERVICE_URL", "http://memory-service:8003"
    )

class ServiceClient:
    def __init__(self, config: ServiceConfig):
        self.config = config
        self._client = httpx.AsyncClient(timeout=10.0)

    async def retrieve_context(self, query: str, top_k: int = 5):
        resp = await self._client.post(
            f"{self.config.RAG_SERVICE_URL}/retrieve",
            json={"query": query, "top_k": top_k},
        )
        resp.raise_for_status()
        return resp.json()
```

## Health Checking Patterns

Health checks are the foundation of service discovery. A service that registers itself but cannot serve requests is worse than a service that is not registered at all. Implement two health check endpoints:

```python
from fastapi import FastAPI
from datetime import datetime

app = FastAPI()

startup_time = datetime.utcnow()
is_ready = False

@app.get("/health/live")
async def liveness():
    """Am I running? Returns 200 if the process is alive."""
    return {"status": "alive", "uptime_seconds": (
        datetime.utcnow() - startup_time
    ).total_seconds()}

@app.get("/health/ready")
async def readiness():
    """Can I serve traffic? Checks all dependencies."""
    checks = {}
    try:
        await vector_store.ping()
        checks["vector_store"] = "ok"
    except Exception:
        checks["vector_store"] = "failed"

    try:
        await embedding_model.ping()
        checks["embedding_model"] = "ok"
    except Exception:
        checks["embedding_model"] = "failed"

    all_healthy = all(v == "ok" for v in checks.values())
    if not all_healthy:
        return JSONResponse(
            status_code=503,
            content={"status": "not_ready", "checks": checks},
        )
    return {"status": "ready", "checks": checks}

@app.on_event("startup")
async def on_startup():
    global is_ready
    await vector_store.connect()
    await embedding_model.load()
    is_ready = True
```

The liveness probe tells Kubernetes whether to restart the pod. The readiness probe tells Kubernetes whether to send traffic to it. A pod that has a healthy process but a disconnected database should fail readiness (removing it from the load balancer) without failing liveness (which would restart it unnecessarily).

## Consul for Multi-Environment Discovery

When your agent services span multiple environments — some on Kubernetes, some on bare-metal GPU servers, some in a different cloud — Consul provides service discovery that works across boundaries:

```python
import consul

class ConsulServiceRegistry:
    def __init__(self, host: str = "consul-server", port: int = 8500):
        self.client = consul.Consul(host=host, port=port)

    def register(
        self,
        service_name: str,
        service_id: str,
        address: str,
        port: int,
        tags: list[str] = None,
    ):
        self.client.agent.service.register(
            name=service_name,
            service_id=service_id,
            address=address,
            port=port,
            tags=tags or [],
            check=consul.Check.http(
                f"http://{address}:{port}/health/ready",
                interval="10s",
                timeout="5s",
                deregister="30s",
            ),
        )

    def discover(self, service_name: str) -> list[dict]:
        _, services = self.client.health.service(
            service_name, passing=True
        )
        return [
            {
                "address": svc["Service"]["Address"],
                "port": svc["Service"]["Port"],
                "tags": svc["Service"]["Tags"],
            }
            for svc in services
        ]
```

## Client-Side Load Balancing

With service discovery returning multiple healthy instances, implement client-side load balancing for smarter routing:

```python
import random

class LoadBalancedClient:
    def __init__(self, registry: ConsulServiceRegistry, service: str):
        self.registry = registry
        self.service = service
        self._instances: list[dict] = []
        self._index = 0

    async def refresh_instances(self):
        self._instances = self.registry.discover(self.service)

    def next_instance(self) -> dict:
        if not self._instances:
            raise RuntimeError(f"No healthy instances for {self.service}")
        # Round-robin selection
        instance = self._instances[self._index % len(self._instances)]
        self._index += 1
        return instance

    async def call(self, path: str, payload: dict) -> dict:
        instance = self.next_instance()
        url = f"http://{instance['address']}:{instance['port']}{path}"
        async with httpx.AsyncClient() as client:
            resp = await client.post(url, json=payload, timeout=10.0)
            resp.raise_for_status()
            return resp.json()
```

## FAQ

### Is Kubernetes DNS sufficient, or do I need Consul?

Kubernetes DNS is sufficient if all your agent services run within a single Kubernetes cluster. It requires zero configuration and integrates natively with Kubernetes health checks. Add Consul only if your services span multiple clusters, include non-Kubernetes workloads (like GPU servers running outside the cluster), or you need advanced features like service mesh, key-value configuration, or multi-datacenter discovery.

### How often should health checks run for AI agent services?

Every 10 seconds for readiness checks and every 20 seconds for liveness checks is a good default. AI services that load large models during startup should use a longer `initialDelaySeconds` (30-60 seconds) to avoid being killed before they finish loading. For latency-sensitive agent systems, consider reducing readiness check intervals to 5 seconds.

### What happens when a service has zero healthy instances?

The calling service should implement a circuit breaker pattern. After a threshold of consecutive failures (e.g., 5), the circuit opens and the caller immediately returns an error instead of waiting for timeouts. This prevents cascading failures where one unhealthy service causes all upstream services to block on network timeouts.

---

#ServiceDiscovery #Kubernetes #Consul #Microservices #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/service-discovery-ai-agent-microservices-consul-kubernetes