BeginnerAI Engineer

Trace an LLM call end-to-end in Langfuse

Add Langfuse instrumentation to an existing chatbot or LLM script. You will create traces, nest spans for retrieval and generation steps, capture latency and token cost per call, and use the Langfuse dashboard to identify the most expensive query across a simulated 10-turn session.

Why this matters

You cannot improve what you cannot measure. Latency and cost in LLM systems are almost never distributed evenly; one query type often accounts for 40% of spend. Langfuse makes this visible in minutes, and the habit of adding traces from day one prevents the situation where a production system is burning money in a way no one can explain.

Before you start

A working LLM script or chatbot you can modify (even a simple Claude call loop is fine)
Langfuse account (free tier is sufficient) or self-hosted instance
LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY environment variables
pip install langfuse anthropic

Step-by-step guide

1
Initialise the Langfuse client
Import Langfuse and create a client using your public and secret keys. Verify the connection by running langfuse.auth_check(); it will throw if the credentials are wrong. Add this to your script's initialisation block.
2
Wrap each user turn in a trace
For each user message, call langfuse.trace(name='chat-turn', input=user_message). This creates the top-level entry in Langfuse's UI. Pass a session_id so all 10 turns from your simulated session appear grouped together.
3
Add a generation span for the LLM call
Inside the trace, call trace.generation(name='claude', model='claude-sonnet-4-6', input=messages) before your API call. After the call returns, call generation.end(output=response.content, usage=response.usage). This captures the token counts and latency for the LLM step specifically.
4
Simulate 10 turns and check the dashboard
Run your chatbot through 10 varied queries. Open the Langfuse dashboard, navigate to your session, and look at the timeline. Identify which turn has the highest total_cost and which has the highest latency. These are usually different turns.
5
Add a retrieval span if using RAG
If your chatbot fetches context before calling Claude, wrap the retrieval step in trace.span(name='retrieval'). Record how many chunks were retrieved and what the search query was. Langfuse will show retrieval latency separately from generation latency; often retrieval is the bottleneck, not the model.

Relevant Axiom pages

Langfuse Tracing Anthropic API

What to do next

Back to Practice Lab

Why this matters

Before you start

Step-by-step guide

Initialise the Langfuse client

Wrap each user turn in a trace

Add a generation span for the LLM call

Simulate 10 turns and check the dashboard

Add a retrieval span if using RAG

Relevant Axiom pages

What to do next