OpenAI Agents SDK

OpenAI's official Python framework for multi-agent systems (March 2025) — lightweight, model-driven handoffs, built-in guardrails; trade-off vs LangGraph is simplicity at the cost of persistence and HITL support.

Updated Invalid Date·

openai agents-sdk swarm handoffs guardrails tracing python

OpenAI's official Python framework for building multi-agent systems, released March 2025. The spiritual successor to their experimental Swarm library (late 2024). Lightweight by design. It doesn't try to do everything LangGraph does, but it's idiomatic, well-documented, and you'll encounter it in production repos.

Install: pip install openai-agents

Core Concepts

Four primitives cover almost everything:

Primitive	What it is
Agent	An LLM with a name, instructions, tools, and optional handoffs
Handoff	Transfer control from one agent to another mid-conversation
Guardrail	Input/output validation that runs before/after the agent
Runner	Executes the agent loop, manages history, handles tool calls

Basic Agent

from agents import Agent, Runner

agent = Agent(
    name="Support Assistant",
    instructions="You are a helpful customer support agent. Be concise.",
)

result = Runner.run_sync(agent, "How do I reset my password?")
print(result.final_output)

Runner.run_sync is the blocking version. For async:

import asyncio

async def main():
    result = await Runner.run(agent, "How do I reset my password?")
    print(result.final_output)

asyncio.run(main())

Tools

Any Python function decorated with @function_tool becomes available to the agent.

from agents import Agent, Runner, function_tool

@function_tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    # In production: call a weather API
    return f"The weather in {city} is 18°C and cloudy."

@function_tool
def search_knowledge_base(query: str) -> str:
    """Search internal documentation for relevant articles."""
    # In production: call your vector store
    return f"Found 3 articles matching '{query}'..."

agent = Agent(
    name="Research Assistant",
    instructions="Help users find information. Use the knowledge base before answering.",
    tools=[get_weather, search_knowledge_base],
)

The SDK automatically extracts the function signature and docstring into a JSON schema that the model receives. Keep docstrings clear. They're part of the prompt.

Handoffs (Multi-Agent)

Handoffs transfer control to a specialised agent. The receiving agent inherits the conversation history.

from agents import Agent, Runner

billing_agent = Agent(
    name="Billing Agent",
    instructions="Handle billing questions: invoices, refunds, payment methods.",
)

technical_agent = Agent(
    name="Technical Agent",
    instructions="Handle technical issues: bugs, integrations, API errors.",
)

triage_agent = Agent(
    name="Triage Agent",
    instructions="""Classify the customer's issue and hand off to the right agent.
- Billing questions → billing_agent
- Technical questions → technical_agent
- Everything else: handle yourself.""",
    handoffs=[billing_agent, technical_agent],
)

result = Runner.run_sync(triage_agent, "My payment keeps failing with error code 402")
print(result.final_output)
# Triage agent routes to billing_agent, which handles the payment issue

The triage agent decides when to hand off. It's still model-driven, not rule-based. For rule-based routing, use Python logic in a tool instead.

Guardrails

Guardrails validate inputs and outputs. Run before the agent processes input (input guardrail) or before returning to the user (output guardrail).

from agents import Agent, Runner, GuardrailFunctionOutput, input_guardrail, output_guardrail
from pydantic import BaseModel

class ContentCheck(BaseModel):
    is_safe: bool
    reason: str

@input_guardrail
async def check_safe_input(ctx, agent, input: str) -> GuardrailFunctionOutput:
    """Block requests containing PII or harmful content."""
    # Use a cheap model for the check
    check = await fast_safety_check(input)
    return GuardrailFunctionOutput(
        output_info=ContentCheck(is_safe=check.safe, reason=check.reason),
        tripwire_triggered=not check.safe,  # True = block the request
    )

@output_guardrail
async def check_safe_output(ctx, agent, output: str) -> GuardrailFunctionOutput:
    """Ensure output doesn't leak sensitive data."""
    contains_pii = detect_pii(output)
    return GuardrailFunctionOutput(
        output_info=ContentCheck(is_safe=not contains_pii, reason="PII detected" if contains_pii else "OK"),
        tripwire_triggered=contains_pii,
    )

agent = Agent(
    name="Secure Agent",
    instructions="Answer user questions helpfully.",
    input_guardrails=[check_safe_input],
    output_guardrails=[check_safe_output],
)

If tripwire_triggered=True, the SDK raises a GuardrailTripwireTriggered exception. Catch it to return a safe fallback response.

Structured Output

Force the agent to return a specific schema:

from pydantic import BaseModel
from agents import Agent, Runner

class TicketClassification(BaseModel):
    category: str        # billing | technical | refund | general
    priority: str        # low | medium | high
    sentiment: str       # positive | neutral | negative
    summary: str

classifier = Agent(
    name="Classifier",
    instructions="Classify support tickets accurately.",
    output_type=TicketClassification,
)

result = Runner.run_sync(classifier, "I've been charged twice this month and I'm furious!")
ticket = result.final_output  # type: TicketClassification
print(ticket.category)   # billing
print(ticket.sentiment)  # negative

Context and State

Pass shared state through the context parameter. Available to all tools and agents in the run:

from dataclasses import dataclass
from agents import Agent, Runner, RunContextWrapper, function_tool

@dataclass
class UserContext:
    user_id: str
    plan: str           # free | pro | enterprise
    account_age_days: int

@function_tool
def get_user_plan(ctx: RunContextWrapper[UserContext]) -> str:
    """Get the user's current subscription plan."""
    return f"Plan: {ctx.context.plan}, Account age: {ctx.context.account_age_days} days"

agent = Agent(
    name="Account Agent",
    instructions="Help users understand their account.",
    tools=[get_user_plan],
)

user_ctx = UserContext(user_id="u_123", plan="pro", account_age_days=180)
result = Runner.run_sync(agent, "What features do I have access to?", context=user_ctx)

Tracing

Built-in tracing with OpenAI's platform. Automatic when you have OPENAI_API_KEY set.

from agents import Agent, Runner
from agents.tracing import set_tracing_export_api_key

# Or export to a custom endpoint
set_tracing_export_api_key("your-key")

agent = Agent(name="My Agent", instructions="...")
result = Runner.run_sync(agent, "Hello")
# Visit platform.openai.com to inspect the trace

Trace data includes: agent runs, tool calls, handoffs, input/output at each step, latency per step. Not self-hostable (unlike Langfuse). If you need self-hosted tracing, hook into OpenTelemetry instead.

Streaming

Stream responses to the user in real time:

from agents import Agent, Runner

agent = Agent(name="Streaming Agent", instructions="Be helpful.")

async def stream_response(query: str):
    async with Runner.run_streamed(agent, query) as stream:
        async for event in stream.stream_events():
            if event.type == "raw_response_event":
                # Print each text chunk as it arrives
                print(event.data.delta, end="", flush=True)

SDK vs LangGraph

Feature	OpenAI Agents SDK	LangGraph
Learning curve	Low	High
Verbosity	Minimal	Verbose
Routing	Model-driven handoffs	Explicit graph edges
State management	Simple context object	Full StateGraph with reducers
Persistence / checkpointing	None built-in	PostgresSaver, RedisSaver
Human-in-the-loop	Not built-in	interrupt(), Command
Framework lock-in	OpenAI API only	Any LLM via LangChain
Tracing	OpenAI platform	LangSmith (built-in), Langfuse
Production maturity	v1.0 (March 2025)	v1.0 (stable)

Use OpenAI Agents SDK when: you're already using OpenAI, want simple multi-agent handoffs fast, don't need checkpointing or HITL.

Use LangGraph when: you need durable state, human-in-the-loop, complex conditional routing, or provider agnosticism.

With Claude (via LiteLLM)

The SDK is OpenAI-centric but can use Claude via LiteLLM as a proxy:

import litellm
from agents import Agent, Runner, OpenAIChatCompletionsModel
from openai import AsyncOpenAI

# LiteLLM proxies Claude to OpenAI-compatible endpoints
litellm_client = AsyncOpenAI(
    api_key="sk-1234",  # LiteLLM proxy key
    base_url="http://localhost:4000",
)

claude_model = OpenAIChatCompletionsModel(
    model="claude-sonnet-4-6",
    openai_client=litellm_client,
)

agent = Agent(
    name="Claude Agent",
    instructions="You are a helpful assistant.",
    model=claude_model,
)

In practice: if you're using Claude natively, the Anthropic SDK with a custom agent loop (or LangGraph) is cleaner. The Agents SDK shines for OpenAI workflows.

Real-World Pattern: Research + Writer Pipeline

from agents import Agent, Runner, function_tool

@function_tool
def search_web(query: str) -> str:
    """Search the web and return relevant excerpts."""
    # call Perplexity, Tavily, or similar
    return web_search(query)

researcher = Agent(
    name="Researcher",
    instructions="""Search for information on the given topic.
Return a structured research summary with sources.""",
    tools=[search_web],
)

writer = Agent(
    name="Writer",
    instructions="""Take the research provided and write a clear, concise blog post.
Target 500 words. Use the research findings directly.""",
)

# Sequential pipeline
async def research_and_write(topic: str) -> str:
    research_result = await Runner.run(researcher, f"Research: {topic}")
    research_summary = research_result.final_output

    write_result = await Runner.run(
        writer,
        f"Write a blog post based on this research:\n\n{research_summary}",
    )
    return write_result.final_output

Key Facts

Install: pip install openai-agents (released March 2025)
Four primitives: Agent, Handoff, Guardrail, Runner
Runner.run_sync for blocking; Runner.run for async; Runner.run_streamed for streaming
Routing is model-driven — for rule-based routing, use Python logic in a tool instead
tripwire_triggered=True raises GuardrailTripwireTriggered exception
Built-in tracing exports to OpenAI platform only; not self-hostable
No built-in checkpointing/persistence — use LangGraph if you need durable state
Can use Claude via LiteLLM proxy as an OpenAI-compatible endpoint

Common Failure Cases

Model-driven handoff routes to the wrong agent for edge-case inputs
Why: the triage agent decides which specialist to use based on semantics; ambiguous inputs may not match any agent's description clearly.
Detect: enable tracing; check which agent handled ambiguous requests; user reports receiving wrong-specialist answers.
Fix: add rule-based pre-routing for well-defined categories using a Python tool; only fall back to model-driven handoff for genuinely ambiguous cases.

GuardrailTripwireTriggered exception not caught, returning a 500 error
Why: guardrail exceptions bubble up to the caller; if the outer request handler doesn't catch them, the user gets an unhandled error.
Detect: HTTP 500 responses when requests trigger a guardrail; exception in logs: GuardrailTripwireTriggered.
Fix: wrap Runner.run() calls in try/except for GuardrailTripwireTriggered; return a graceful "I can't help with that" response.

Structured output (output_type) fails when model adds explanation text before JSON
Why: some models preface structured output with "Here is the JSON:" before the actual JSON block; the parser fails on the prefix.
Detect: ValidationError in result.final_output parsing; raw model output shows text before the JSON.
Fix: add a system instruction: "Return only valid JSON with no preamble or explanation text"; or use a post-processing step to extract JSON from the response.

No built-in checkpointing means agent state is lost on process crash
Why: the SDK has no persistence layer; if the process crashes mid-run, all context and progress is lost.
Detect: long-running agents resume from the beginning after any error; no recovery mechanism.
Fix: for fault-tolerant workflows, checkpoint state to an external store (Redis/database) at key steps; or switch to LangGraph for durable state.

Tracing only exports to OpenAI platform, blocking self-hosted observability
Why: built-in tracing hardcodes export to api.openai.com; teams using Langfuse or Jaeger need an alternative path.
Detect: traces don't appear in your self-hosted observability stack; only OpenAI's dashboard has data.
Fix: instrument via OpenTelemetry manually; wrap Runner.run() in an OTEL span; export to any OTLP-compatible backend.

Connections

agents/multi-agent-patterns — Supervisor, Swarm, Parallel fan-out patterns the SDK implements
agents/react-pattern — the single-agent loop each SDK Agent runs internally
protocols/tool-design — writing good tool descriptions (used in SDK function docstrings)
apis/openai-api — the underlying API the SDK calls
synthesis/architecture-patterns — where the Agents SDK fits in the broader pattern map
observability/platforms — alternative tracing since SDK tracing is OpenAI-platform-only

Open Questions

How does the Agents SDK handle token counting and context window management for long handoff chains?
Will OpenAI add built-in checkpointing to close the gap with LangGraph for durable workflows?
What is the actual failure rate of model-driven handoff routing vs rule-based routing in production?