---
title: "LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing"
description: "Learn how to extract structured data from LLM responses using LangChain output parsers — Pydantic models, JSON parsing, format instructions, and retry parsers for robust extraction."
canonical: https://callsphere.ai/blog/langchain-output-parsers-pydantic-json-structured-parsing
category: "Learn Agentic AI"
tags: ["LangChain", "Output Parsing", "Pydantic", "Structured Data", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T07:38:14.480Z
---

# LangChain Output Parsers: Pydantic, JSON, and Structured Output Parsing

> Learn how to extract structured data from LLM responses using LangChain output parsers — Pydantic models, JSON parsing, format instructions, and retry parsers for robust extraction.

## Why Structured Output Matters

LLMs produce free-form text by default. But downstream code needs structured data — objects, lists, dictionaries, typed fields. Output parsers bridge this gap by defining an expected schema, generating format instructions for the prompt, and parsing the LLM's response into the target structure.

Without structured parsing, you end up writing fragile regex or string-splitting logic that breaks when the model changes phrasing. LangChain's parsers standardize this process and include retry mechanisms for when the model produces malformed output.

## The with_structured_output Approach

Modern LangChain models support `with_structured_output()`, which uses the model's native structured output capability (function calling or JSON mode) rather than text parsing.

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class MovieReview(BaseModel):
    title: str = Field(description="The movie title")
    rating: float = Field(description="Rating from 0 to 10")
    summary: str = Field(description="One sentence summary")
    recommended: bool = Field(description="Whether you recommend this movie")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm = llm.with_structured_output(MovieReview)

result = structured_llm.invoke("Review the movie Inception")
print(type(result))        #
print(result.title)        # "Inception"
print(result.rating)       # 8.5
print(result.recommended)  # True
```

This is the recommended approach for models that support it. The Pydantic schema is converted to a function/tool schema, and the model returns structured JSON that is automatically parsed.

## PydanticOutputParser

For models without native structured output, the `PydanticOutputParser` adds format instructions to the prompt and parses the text response.

```python
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class Recipe(BaseModel):
    name: str = Field(description="Name of the recipe")
    ingredients: list[str] = Field(description="List of ingredients")
    prep_time_minutes: int = Field(description="Preparation time in minutes")
    difficulty: str = Field(description="Easy, Medium, or Hard")

parser = PydanticOutputParser(pydantic_object=Recipe)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful cooking assistant."),
    ("human", "Give me a recipe for {dish}.\n\n{format_instructions}"),
])

chain = prompt.partial(
    format_instructions=parser.get_format_instructions()
) | llm | parser

recipe = chain.invoke({"dish": "pasta carbonara"})
print(recipe.name)
print(recipe.ingredients)
print(recipe.prep_time_minutes)
```

`parser.get_format_instructions()` returns a string that tells the model exactly what JSON structure to produce. The parser then validates the response against the Pydantic model.

## JsonOutputParser

When you want raw dictionaries instead of Pydantic objects, use `JsonOutputParser`.

```python
from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser()

chain = prompt | llm | parser
result = chain.invoke({"dish": "tacos"})
print(type(result))  #
```

You can optionally provide a Pydantic model for format instructions without strict validation:

```python
parser = JsonOutputParser(pydantic_object=Recipe)
# Generates format instructions but returns a dict, not a Recipe object
```

## StrOutputParser and CommaSeparatedListOutputParser

For simpler outputs, use lightweight parsers.

```python
from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import CommaSeparatedListOutputParser

# Plain string
str_parser = StrOutputParser()
result = str_parser.invoke(ai_message)  # "Just the text content"

# Comma-separated list
list_parser = CommaSeparatedListOutputParser()
chain = prompt | llm | list_parser
result = chain.invoke({"topic": "Python frameworks"})
# ["Django", "Flask", "FastAPI", "LangChain"]
```

## Output-Fixing and Retry Parsers

LLMs sometimes produce invalid output. Retry parsers automatically fix these failures.

```python
from langchain.output_parsers import OutputFixingParser, RetryOutputParser
from langchain_openai import ChatOpenAI

base_parser = PydanticOutputParser(pydantic_object=Recipe)

# Option 1: Use another LLM call to fix malformed output
fixing_parser = OutputFixingParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
)

# If the base parser fails, the fixing parser sends the bad output
# to the LLM with instructions to fix the formatting
result = fixing_parser.parse(bad_output_string)
```

`OutputFixingParser` receives the malformed output and asks the LLM to reformat it. `RetryOutputParser` goes further by resending the original prompt along with the error, giving the LLM full context to produce a corrected response.

```python
retry_parser = RetryOutputParser.from_llm(
    parser=base_parser,
    llm=ChatOpenAI(model="gpt-4o-mini"),
    max_retries=2,
)
```

## Enum and Datetime Parsers

LangChain includes specialized parsers for common types.

```python
from langchain.output_parsers import EnumOutputParser
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

parser = EnumOutputParser(enum=Sentiment)
result = parser.parse("positive")
print(result)  # Sentiment.POSITIVE
```

## Composing Parsers in LCEL

Parsers are runnables, so they integrate seamlessly into LCEL chains.

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Analysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(description="Confidence score 0-1")
    key_phrases: list[str] = Field(description="Important phrases")

parser = PydanticOutputParser(pydantic_object=Analysis)

chain = (
    ChatPromptTemplate.from_template(
        "Analyze this text: {text}\n{format_instructions}"
    ).partial(format_instructions=parser.get_format_instructions())
    | ChatOpenAI(model="gpt-4o-mini")
    | parser
)

analysis = chain.invoke({"text": "The product quality is outstanding!"})
print(analysis.sentiment)    # "positive"
print(analysis.confidence)   # 0.95
```

## FAQ

### Should I use with_structured_output or PydanticOutputParser?

Use `with_structured_output()` whenever the model supports it — it is more reliable because the model returns structured JSON natively rather than embedding JSON in free text. Fall back to `PydanticOutputParser` for models that lack native structured output support.

### What happens when the LLM ignores format instructions?

The parser raises an `OutputParserException`. Wrap your parser with `OutputFixingParser` or `RetryOutputParser` to handle these failures automatically. Alternatively, `with_structured_output` avoids this issue entirely by constraining the output format at the API level.

### Can I parse streaming output into structured objects?

Yes, if the model supports streaming structured output. Use `JsonOutputParser` with `chain.stream()` to receive partial JSON objects as they are generated. For Pydantic parsing, you typically need the full response before validation can occur.

---

#LangChain #OutputParsing #Pydantic #StructuredData #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/langchain-output-parsers-pydantic-json-structured-parsing
