---
title: "Named Entity Recognition for AI Agents: Extracting People, Places, and Organizations"
description: "Learn how to implement Named Entity Recognition in AI agent pipelines using spaCy and LLMs, covering entity types, custom entity training, and real-time extraction strategies."
canonical: https://callsphere.ai/blog/named-entity-recognition-ai-agents-extracting-people-places-organizations
category: "Learn Agentic AI"
tags: ["NER", "NLP", "spaCy", "Entity Extraction", "Python", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:43.990Z
---

# Named Entity Recognition for AI Agents: Extracting People, Places, and Organizations

> Learn how to implement Named Entity Recognition in AI agent pipelines using spaCy and LLMs, covering entity types, custom entity training, and real-time extraction strategies.

## Why Agents Need Named Entity Recognition

When an AI agent receives a message like "Schedule a meeting with Sarah Chen at the Austin office next Tuesday," it must extract three distinct pieces of structured information: a person (Sarah Chen), a location (Austin office), and a date (next Tuesday). Named Entity Recognition (NER) is the NLP technique that performs this extraction automatically.

Without NER, an agent would have to rely entirely on the LLM to parse every incoming message from scratch. While LLMs are capable of entity extraction, dedicated NER pipelines are faster, cheaper, and more predictable for high-volume workloads. The best agent architectures combine both approaches — fast NER for common entities and LLM fallback for ambiguous cases.

## Entity Types Every Agent Developer Should Know

The standard NER taxonomy covers these categories:

```mermaid
flowchart LR
    CALLER(["Client"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Salon AI Agent"]
        STT["Streaming STT
Deepgram or Whisper"]
        NLU{"Intent and
Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS
ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and
Schedule")]
        KB[("Knowledge Base
and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Appointment booked"])
        O2(["Reschedule completed"])
        O3(["Stylist handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS  CRM
    TOOLS  CAL
    TOOLS  KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
```

- **PERSON** — individual names (Sarah Chen, Dr. Patel)
- **ORG** — companies, agencies, institutions (Acme Corp, FDA)
- **GPE** — geopolitical entities like countries, cities, states (Austin, France)
- **DATE** — absolute or relative dates (March 15, next Tuesday)
- **MONEY** — monetary values ($500, 12.5 million euros)
- **PRODUCT** — named products (iPhone 16, Tesla Model Y)

Most pre-trained NER models handle these out of the box. Custom entities — like internal project names, medical codes, or proprietary terms — require additional training.

## NER with spaCy: The Fast Path

spaCy provides production-grade NER that runs in milliseconds per document. Here is a complete extraction pipeline.

```python
import spacy

nlp = spacy.load("en_core_web_trf")  # Transformer-based model

def extract_entities(text: str) -> dict[str, list[str]]:
    """Extract named entities grouped by type."""
    doc = nlp(text)
    entities: dict[str, list[str]] = {}

    for ent in doc.ents:
        if ent.label_ not in entities:
            entities[ent.label_] = []
        if ent.text not in entities[ent.label_]:
            entities[ent.label_].append(ent.text)

    return entities

message = "Tell John at Microsoft to review the Q3 report by March 20th."
result = extract_entities(message)
# {'PERSON': ['John'], 'ORG': ['Microsoft'], 'DATE': ['Q3', 'March 20th']}
```

## Training Custom Entities

When your agent operates in a specialized domain, you need custom entity types. Here is how to train spaCy to recognize a custom PRODUCT_CODE entity.

```python
import spacy
from spacy.training import Example
import random

nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")
ner.add_label("PRODUCT_CODE")

train_data = [
    ("Order SKU-4829 immediately", {"entities": [(6, 14, "PRODUCT_CODE")]}),
    ("Check stock for SKU-1100", {"entities": [(16, 24, "PRODUCT_CODE")]}),
    ("We need SKU-7753 and SKU-2201", {
        "entities": [(8, 16, "PRODUCT_CODE"), (21, 29, "PRODUCT_CODE")]
    }),
]

optimizer = nlp.begin_training()
for epoch in range(30):
    random.shuffle(train_data)
    for text, annotations in train_data:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        nlp.update([example], sgd=optimizer)

doc = nlp("Ship SKU-9981 to warehouse B")
for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")
# SKU-9981 -> PRODUCT_CODE
```

## LLM-Based NER for Complex Cases

For ambiguous text or zero-shot entity types, LLMs provide flexible extraction without training data.

```python
import openai

def llm_extract_entities(text: str, entity_types: list[str]) -> dict:
    """Use an LLM for zero-shot entity extraction."""
    prompt = f"""Extract the following entity types from the text.
Return JSON only. Entity types: {', '.join(entity_types)}

Text: {text}"""

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return response.choices[0].message.content

result = llm_extract_entities(
    "Dr. Amara Osei from Nairobi General prescribed amoxicillin 500mg",
    ["PERSON", "FACILITY", "MEDICATION", "DOSAGE"]
)
```

## Integrating NER into an Agent Pipeline

The most effective pattern is a two-tier approach: fast spaCy extraction first, with LLM fallback for unrecognized patterns.

```python
class NERProcessor:
    def __init__(self):
        self.nlp = spacy.load("en_core_web_sm")
        self.confidence_threshold = 0.85

    def process(self, text: str) -> dict:
        doc = self.nlp(text)
        entities = {}
        low_confidence = []

        for ent in doc.ents:
            if ent.kb_id_ and float(ent.kb_id_) < self.confidence_threshold:
                low_confidence.append(ent.text)
            else:
                entities.setdefault(ent.label_, []).append(ent.text)

        return {
            "entities": entities,
            "needs_llm_review": low_confidence,
        }
```

## FAQ

### When should I use spaCy NER versus LLM-based extraction?

Use spaCy for high-throughput scenarios where you need consistent, fast extraction of standard entity types. Use LLM-based extraction when you need zero-shot recognition of novel entity types, when the text is highly ambiguous, or when you cannot invest in training data for a custom model.

### How do I handle entities that span multiple tokens or contain special characters?

spaCy handles multi-token entities natively through its span-based architecture. During training, define entity boundaries using character offsets that encompass the full span. For special characters like hyphens or periods in entity names, ensure your tokenizer does not split them incorrectly by adding custom tokenization rules.

### Can I combine multiple NER models in a single agent pipeline?

Yes. A common pattern is to run a general-purpose model for standard entities (PERSON, ORG, GPE) and a domain-specific model for specialized entities (MEDICATION, LEGAL_CLAUSE). Merge the results and use a deduplication step to handle overlapping spans, keeping the prediction with higher confidence.

---

#NER #NLP #SpaCy #EntityExtraction #Python #AIAgents #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/named-entity-recognition-ai-agents-extracting-people-places-organizations
