instructor — Structured LLM Outputs

instructor wraps the Anthropic and OpenAI clients to enforce Pydantic schema validation on every LLM response, with automatic retry on validation failure.

instructor wraps the Anthropic and OpenAI clients to enforce Pydantic schema validation on every LLM response, with automatic retry on validation failure. It is the standard library for getting reliable structured outputs from LLMs in Python.

Why It Exists

Raw LLM APIs return strings. Getting structured data means either:

  • Parsing JSON yourself (brittle, fails on malformed output)
  • Using response_format={"type": "json_object"} (no schema enforcement)
  • Using instructor (schema enforcement + automatic retry + streaming support)

instructor patches the client so you call .chat.completions.create() as normal, but pass response_model=YourPydanticModel and get back a validated instance.


Installation

pip install instructor

Basic Usage

With Anthropic

import anthropic
import instructor
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class User(BaseModel):
    name: str
    age: int
    email: str

user = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract: John Smith, 34, john@example.com"}
    ],
    response_model=User,
)

print(user.name)   # "John Smith"
print(user.age)    # 34

With OpenAI

import openai
import instructor

client = instructor.from_openai(openai.OpenAI())

user = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Extract: Alice, 28, alice@example.com"}],
    response_model=User,
)

Validation with Pydantic

Pydantic validators run after extraction. If they fail, instructor retries the LLM call with the validation error as feedback. Up to max_retries times.

from pydantic import BaseModel, field_validator

class UserProfile(BaseModel):
    name: str
    age: int
    email: str

    @field_validator("age")
    @classmethod
    def age_must_be_positive(cls, v: int) -> int:
        if v < 0:
            raise ValueError("age must be positive")
        return v

    @field_validator("email")
    @classmethod
    def email_must_have_at(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("not a valid email")
        return v

If the LLM returns age: -5, the validator raises, instructor sends the error back to the model: "age must be positive. Please fix", and retries. Default is 3 retries.

# Configure retries
profile = client.messages.create(
    ...,
    response_model=UserProfile,
    max_retries=5,
)

Nested Models

from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    founded: int
    headquarters: Address
    employees: Optional[int] = None

company = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Tell me about Anthropic the AI company"}],
    response_model=Company,
)

print(company.headquarters.city)   # "San Francisco"

Lists and Optional Fields

from typing import Optional
from pydantic import BaseModel

class Ingredient(BaseModel):
    name: str
    quantity: str

class Recipe(BaseModel):
    title: str
    ingredients: list[Ingredient]
    prep_time_minutes: int
    difficulty: Optional[str] = None

recipe = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Give me a pasta recipe"}],
    response_model=Recipe,
)

Streaming Partial Objects

For large structures, stream partial results as they arrive:

from instructor import Partial

for partial_user in client.messages.create_partial(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Extract user data from this CV..."}],
    response_model=Partial[UserProfile],
):
    print(partial_user.name)  # fills in as tokens arrive

Classification Pattern

A common use: enum-constrained classification.

from enum import Enum

class Sentiment(str, Enum):
    positive = "positive"
    negative = "negative"
    neutral = "neutral"

class SentimentResult(BaseModel):
    sentiment: Sentiment
    confidence: float
    reasoning: str

result = client.messages.create(
    model="claude-haiku-4-5-20251001",  # fast model for classification
    max_tokens=256,
    messages=[{"role": "user", "content": "The product was okay, nothing special."}],
    response_model=SentimentResult,
)

When to Use

ScenarioUse instructor?
Extracting structured data from textYes
Classification into known categoriesYes
Multi-step reasoning where intermediate steps need validationYes
Simple Q&A or chatNo — raw API is simpler
Agent tool calls (tools already return typed data)Usually no
Batch processing with strict schema requirementsYes

Relationship to Other Approaches

  • Raw API tool_use — also returns structured data, but you define a JSON Schema tool instead of a Pydantic model. More verbose. Use when the schema needs to vary at runtime.
  • DSPy — optimises prompts automatically; instructor enforces output structure. Complementary: use DSPy to optimise the prompt, instructor to validate the output.
  • Guardrails AI — broader output safety/validation framework; instructor focuses specifically on Pydantic schema enforcement. See security/guardrails for comparison.

Common Failure Cases

instructor exhausts max_retries and raises InstructorRetryException for a valid model response
Why: the model returns a valid JSON object but includes it inside a markdown code fence (\``json ... ```); instructor's default parser fails to extract the JSON, retries the call each time, and eventually exhausts retries. Detect: InstructorRetryExceptionraised even though the model's raw response contains correct JSON when inspected manually; the validation error message says "JSON parse error" or "Expecting value". Fix: setmode=instructor.Mode.MD_JSONfor models that reliably wrap JSON in markdown; or setmode=instructor.Mode.JSON` to force JSON mode via the API when supported.

Nested Pydantic model fails to validate because a required field is missing from the model output
Why: when the LLM generates a nested object, it may omit an inner required field, especially for deeply nested structures; instructor retries but the model continues to omit the same field, leading to max_retries exhaustion.
Detect: retry loop always fails on the same Field required validation error for a nested field; the model never includes the field despite the retry error message.
Fix: add a description to the missing field explaining what it represents and an example value; make the field Optional with a sensible default if it can genuinely be absent; simplify nested structures.

Partial[Model] streaming returns incomplete objects that are not caught before use
Why: create_partial() yields partial model instances as tokens arrive; calling code that accesses attributes on partially-filled objects may encounter None where a required field is expected, causing AttributeError or ValidationError.
Detect: intermittent AttributeError when processing partial stream results; the final complete object is valid but intermediate partials have None for expected fields.
Fix: check for None on all accessed attributes when consuming partial results; or accumulate the stream and only process the final complete object if partial updates are not needed for the UX.

Using the sync instructor.from_anthropic(anthropic.Anthropic()) client in an async function blocks the event loop
Why: the sync Anthropic client blocks the thread while waiting for the API response; inside an async function, this blocks the entire asyncio event loop, preventing other coroutines from running.
Detect: other concurrent async tasks pause during the instructor call; overall throughput degrades significantly when multiple instructor calls run concurrently.
Fix: use instructor.from_anthropic(AsyncAnthropic()) for the async client and await client.messages.create(...) in async functions.

Connections

Open Questions

  • What performance characteristics only become problems at production scale?
  • What does this library handle poorly that its documentation does not mention?